Digital Science's CEO talks about Open Research -

Almost every modern science deals with large volumes of data. Digital Science is a company focused on developing instruments for the scientific ecosystem, which will simplify research processes and increase their efficiency.

Almost every modern science deals with large volumes of data. In order to handle the ever increasing pace of research studies, scientists are forced to seek out more advanced methods of organizing their scientific work, learn effective collaboration techniques, while also changing the principles of preprint and article sharing and publication.

Digital Science is a company focused on developing instruments for the scientific ecosystem, which will simplify research processes and increase their efficiency. The CEO of the company, Daniel Hook, worked together with his colleagues to create the report The Ascent of Open Access, which is dedicated to the topic of open research and access to the results of scientific studies.

Robert Harington from The Scholarly Kitchen recently interviewed Daniel Hook to discuss the necessary changes in the scientific world, as well as the problems with financing and reaching consensus in the community.

Tell our readers a little bit about yourself and your role at Digital Science

I am a physicist by training. I’m lucky enough to have great collaborators and to work in a field that doesn’t require significant equipment or lab time. As a result, I continue to do research and to publish. I always joke with colleagues that physics isn’t something that you do, it’s someone you are. But, there is an element of truth in that and I like to translate my research experiences into the work that I do at Digital Science: I think that it is extremely important, given Digital Science’s mission to improve the tools for all stakeholders in the research ecosystem, that a researcher’s view is at the center of what we do. Hopefully, the perspective that I bring helps to keep us on the right track.

I came to my current position as CEO of Digital Science having been a co-founder of Symplectic, one of the first companies in which Digital Science invested, via a two-year stint as Director of Research Metrics. That two-year period was a formative one both for me and, as it happened, for Digital Science. Between summer 2013 and summer 2015, I had responsibility for working with the CEOs of Altmetric, figshare, Symplectic and UberResearch. During that time, we developed early versions of Altmetric’s and Figshare’s institutional products and created Digital Science’s Consultancy and Data Science groups. We created Digital Science’s Digital Research Reports, the GRID system and formulated the strategy that led to last year’s launch of Dimensions.

My role at Digital Science is increasingly about ensuring that Digital Science can support critical developments around Open Research.

What were the motivations behind producing your recent report, The Ascent of Open Access?

That is pretty simple really: Open Access is one of the key foundations for Open Research. Almost everything Digital Science does is really about how to support the various stakeholders as we move closer to an Open Research ecosystem. With the launch of Dimensions, we have a unique data source with which we can explore many facets of research. It’s tremendously exciting to have all these data and a way to rapidly obtain answers to questions in an interactive way through the API. All the investment that we’ve put into unique identifiers as a community pays off in Dimensions as I can easily look at regions (e.g. NUTS), institutions (e.g. GRID), people (ORCID), subject categories (e.g. ANZRC/FOR) and Open Access status (Unpaywall). With the addition of the Unpaywall data to the database at the end of last year, it just made sense to do a report on something close to our hearts.

In the title of this post, I use the word «openness», rather than «open access». There is a constant flow of discussion around open access models, but less conversation about an open research ecosystem. How do you view the landscape of open research?

Open Research is a massive topic and it is difficult to do justice to it in just a few words. I completely agree that Open Access is just one component, however, it is probably the best-established piece of open research infrastructure that we have currently. Open Data is obviously a hot topic and developing rapidly. However, I think that we have a lot to learn from the Open Source software movement and thinking about Open Research by borrowing ideas from Open Source. That means that we need to think about open protocols and open methods as well as open peer review. There are many procedural and ethical challenges. The perception is often that there are technical barriers to achieving openness whereas the challenges are predominantly cultural or systemic. To change the culture, we need to re-design academic and industrial incentives so that being open provides a bigger payoff for each individual stakeholder. I think that part of that will come out of the work of DORA — I’m hoping that something like «Open Evaluation» is the result of that initiative.

What do you view as the most significant barriers to open research?

We touched on this a little in the last question. I think that biggest barrier is the existing system of incentives — people are not made professor for making their research openly available — that needs to change. The current system was never built to scale to the current size of the research world. I think that there will be some radical changes in scholarly communication and evaluation. Research, however, is quite rightly a conservative world. Systems need to be tried and tested — we can’t afford to switch to a system that is susceptible to effects like fake news. So, I don’t think that change will happen quickly.

Do you see a widening gap in open research initiatives between well-funded fields in the sciences, and fields in the humanities and social sciences with minimal funding?

I’m not sure that it helps to classify open initiatives by subject on a funding basis. Biomedical research has been the best funded field for many decades and yet it took the field more than 30 years to follow areas such as physics into the preprint arena. Even now, preprinting is confined to some of the more progressive areas of bioscience such as genetics, whereas economics has had a preprint server in RePEc for almost as long as arXiv has supported the high-energy physics community. I think that there are subject-based sensitivities that are often deeply rooted in ethical considerations. There are also issues of format — I think that somehow books are psychologically more challenging to make available as a preprint. And, as ever, incentives in many fields are not well-aligned with preprinting activity and open research in general.

How do preprint servers such as arXiv, bioRxiv, and chemRxiv playing a role in open research going forward?

When I started doing research as a PhD student in theoretical physics, my supervisor introduced me to how we worked in the field. We have an alphabetical list of co-authors and when a paper is finished, we submit it to arXiv, we then wait a couple of days in case anyone raises a critical point and then submit to a journal. For me that training was formative. Every morning I can go to the arXiv and check out the newest developments in my field because everyone does it the same way. I am always up to date with what colleagues are thinking about and working on. Shortcutting a lengthy review process makes research more efficient and helps us move faster. Of course, peer review is a critical tool in verifying results and counter-balancing spurious claims. But, I think that arXiv demonstrates that can happen asynchronously without research quality being compromised — I think that the example of the faster-than-light neutrino experiment is a great example of how preprints can speed up scientific discourse.

In 2017, setting up an arXiv-type clone seemed to be very fashionable. I think that there is the potential for some consolidation there. The original arXiv was not overly constrained by subject (but perhaps more by being LaTeX dependent) and so it covers at its core Physics, Mathematics, and Computer Science and has been extended beyond that over the years. I think that bringing some of these new arXivs together could enhance the power that they have to make a difference to research.

In your report you say «The new ‘atom’ of scholarly communication, beyond the publication, is profoundly driven by research’s new and emerging relationship with data.» Can you give readers a sense of you thinking here?

This is a topic that I plan to return to in much greater detail over the next couple of years, but it’s too complex to write about in any compelling way in this short space. I will say that we’ve been doing a lot of thinking about this at Digital Science.

An ability to understand, manipulate and interact with data pervades professional life, social life and now research life too. All undergrads should be required to take a data science course regardless of their principle topic of study, in my opinion. In the sense of the report, I was motivated to make that common from the work that Mark Hahnel and his team at figshare have done over the last few years: The volumes of data that researchers in almost all fields (especially in humanities and social sciences) now have available to them is very different from just a few years ago. So much of how we think about exploring a problem should be verifiable using data. This seems pretty obviously the case in science with the increasing subtlety of experimental equipment and faster computers to generate data from theoretical models. However, with the ubiquity of mobile devices, portable video cameras and recording devices there is a windfall of information that can benefit the social scientists as well as colleagues in humanities. That means that researchers are having to become proficient with different tools to handle data, perform analyses and cope with different types of thinking.

One of the most interesting features of the report is your analysis of global collaborations and the correlation with investments made in open research. Can you dig into this a little?

The central thesis of the report is that the progression of open access is supported by regular waves of policy activity acting at different levels in the ecosystem: a national policy changes here, an institution implements a mandate there, a funder adds a requirement to a funding program... All these efforts reinforce each other due to fundamental interconnectedness of research — collaborative/co-authoring landscape means that any one of these policy changes improves the open access landscape not just for that institution alone but for any institution that is collaborating on that paper. As my friend Jonathan Adams observed, international collaboration is on the up. For example, the UK is not only one of the largest producers of research outputs in the world but also simultaneously has one of the highest levels of international collaboration on those outputs. That means that when the UK changes its open access policy, as it did powerfully with the REF advice that came out in 2013, it has a global effect. This is exciting to model and to watch as the effect of successive policy changes in different geographies gradually push us toward a paradigm where the majority of research outputs are available through an open access route.

It seems that almost daily we see a new openness plan being announced, be it Plan S, Plan U, or most recently the OA Switchboard initiative announced by the Open Access Scholarly Publishers Association (OASPA) in February of 2019. How do you see solving for the complexity researchers are facing? Do we need a «Plan B»?

That’s a big question and one that I feel ill-equipped to answer. As a researcher, I want it to be simple. I don’t want to have to find money from different pots to publish my work. I don’t want to have to understand licensing and copyright law nor do I want to have to understand if my funder’s requirements are at odds with my institution’s requirements of me or indeed my government’s views on what constitutes open. I also really don’t want to have to go through the same thing with my data and my software as well as my journal article. So, in short, yes, I do think that there needs to be simplification. Not wanting to wade into the minefield that is Plan S, I will say that one thing that must be welcome to everyone is that there is now clear coordination going on between different stakeholders. Ideally this would lead to a framework or standard that allows stakeholders to adopt or to sign up to a standardized set of Open Access requirements that are internally consistent and easy to understand.

Do you have any parting thoughts you would like to leave with the readers of the Scholarly Kitchen?

I’ve given some fairly lengthy answers! Anyone who makes it to this point deserves some kind of prize. I hope that everyone enjoyed reading the report; we certainly enjoyed writing it.

Source: Openness: An interview with Daniel Hook, CEO of Digital Science