Should scientists refuse to review papers that do not make data available?

What should happen when scientists publish work that cannot be replicated?

That’s an important question that people are asking more and more. So often, the basic results of research are hidden inside of figures that display results but don’t allow other scientists to inspect them or combine them together with other work in their own research. So the results sit there, published but essentially useless for building anything new from them.

Early in 2016, Pete Etchells wrote in The Guardian about a modest proposal for peer review: Scientists should refuse to review papers that do not make their data and methods openly available: “How peer reviewers might hold the key to making science more transparent”.

On Wednesday, a new paper published in Royal Society Open Science argued for a new, grassroots approach to this problem, by putting the power back into the hands of scientists at the coalface of research, by changing the way that we think about the peer review process (full disclosure: both myself and fellow Head Quarters blogger Chris Chambers are co-authors on the paper). The Peer Reviewers’ Openness (PRO) Initiative is, at its core, a simple pledge: scientists who sign up to the initiative agree that, from January 1 2017, will not offer to comprehensively review, or recommend the publication of, any scientific research papers for which the data, materials and analysis code are not publicly available, or for which there is no clear reason as to why these things are not available. To date, over 200 scientists have signed the pledge.

The paper in Royal Society Open Science which came out last year was written by Richard Morey and colleagues; Etchells was one of the coauthors. The subsequent year has allowed some time to see results of this initiative. The opinion paper has been cited 37 times according to Google Scholar, which is a strong result for a paper a year out from publication. People are paying attention to the argument. The paper has been strongly cited within the field of psychology, where the “replication crisis” has resulted in many calls for more responsible publication of data and methods.

But on the other hand, the more than 200 scientists who had signed the pledge by early 2016 have increased now up to only 400 or so scientists, according to the Openness Initiative website. The call for direct action hasn’t had quite the level of participation that proponents of the initiative might have hoped.

I have been a very strong proponent of data access, particularly within the field of human evolution. Here are a few of my articles for background:

So why haven’t I signed the pledge?

I find in many of my conversations with paleoanthropologists that most of them just don’t understand what it means to provide data. When I raise the topic of data access, some of them assume that I am expecting scientists to distribute casts for free, or open the doors of fossil vaults to anybody on demand.

In other words, they view data accessibility as some kind of invasion of scientific privacy, or worse, an abrogation of national heritage.

I also see that the conversation about data access in human evolution is having two kinds of effects.

One of these effects is very positive. New papers are being published that include the data necessary for replication. What’s more, referees are demanding more data be included. I’m seeing this in the work on Homo naledi, I think it’s fair to say that my collaborators have done more than any other team in history to provide the data behind the analyses. We’ve included extensive data tables, full details for the multivariate analyses, and high-resolution surface models of the specimens. Even with all this, we are still challenging ourselves within the team to find new ways to do better, to provide more data in a more useful way. Meanwhile, peer referees are encouraging us to continue to raise the bar higher, providing more and more data—including data we have collected on fossils from other field sites.

The second effect is less encouraging. Some researchers who were failing to report basic measurements as recently as seven or eight years ago seem to have simply stopped publishing any new research on fossil material. This is surely not a coincidence. I think we’re finding that some researchers are having trouble bringing their field methods and analytical standards up to a level where they feel comfortable reporting the underlying data.

Genetics went through such a stage, way back in the 1990s. Every lab had its own distinctive protocols for validating basic sequence data or genotypes. The methods were finicky, and in a high-stakes funding environment, labs competed with each other on whether you could “trust” their data. I remember a conversation with a geneticist about one of his scientific rivals, and he said, “Sure, he has good ideas, but you can’t trust his A’s, C’s, G’s, and T’s.”

Opening up the doors and the records in genetics during the late 1990s and 2000s was not invasive, it helped to clear the air. Many genetics labs were working with outmoded processes that needed to be revised or fixed, and by adopting more open protocols, they were able to take advantage of the methodological advances made in other (often much bigger) labs. Meanwhile, sharing data openly before publication allowed people to leverage the common investment being made in sequencing across institutions.

Anthropologists participated in the debates that came from non-transparent methods in genetics. In the early 1990s, it was common to hear anthropologists say, “Sure, the geneticists say this today, but tomorrow the results will be different.” Consider the example of the original 1987 paper by Cann, Stoneking, and Wilson on “mitochondrial Eve”. The results from that paper were subjected to published challenges for the next eight years, including papers reanalyzing the original data by Alan Templeton in 1993 and Christopher Wills in 1995. It took those scientists a long time, with cooperation from the original researchers, to figure out what had originally been done in the analysis, and even longer to go through the subsequent process of review and publication of their reanalyses.

Today, things have changed. New results in genetics are more transparent, they have been seen by a broader range of scientists before publication, and there is often a robust conversation among different labs as a study is being conducted. Published studies may still have weaknesses, but these are more openly discussed than in the past and replication studies are carried out quickly, sometimes while the original research paper is still a preprint.

Paleoanthropology today is like genetics in the late 1990s. I say that as a very positive thing. We are moving as a field toward higher standards in data reporting and transparency. Data access is a process of continual improvement in record keeping, archiving, and communication. This is really the basic lesson of science from high schools onward—if you want to see where you might go wrong, where errors might creep into your analyses, you need to show your work.

What we must do is continue to insist that scientists use the data that have been published. Providing data is one thing, but the true value of providing data is when other scientists reuse it to make their own work better. We need data that we can trust, not data that cannot be replicated by anyone else. If our observations cannot meet that standard, we need to work on our methodology until they can.

Reference

Morey RD, Chambers CD, Etchells PJ, Harris CR, Hoekstra R, Lakens D, Lewandowsky S, Morey CC, Newman DP, Schönbrodt FD, Vanpaemel W, Wagenmakers E-J, Zwaan RA. 2016. The Peer Reviewers' Openness Initiative: incentivizing open research practices through peer review. Royal Society Open Science doi:10.1098/rsos.150547