Peer review under the microscope

The Frontiers Blog has provided a timely review of some of the new models of peer review that are being tried in different branches of scientific publishing: “The silent revolution in peer review”. The review distinguishes “classical peer review”, in which two or three anonymous experts examine work, from a variety of newer approaches. These range from impact-neutral review to fully open review.

An unanswered question is whether the recently-invented forms of review give results that are quantifiably better—either faster, with fewer errors, more highly cited, or with a lower rate of corrections and retractions. In a system-wide sense, scientific research outputs are so varied that it is difficult to statistically compare in a meaningful way.

These are issues of policy that must necessarily remain open. But tackling them requires an evidence base, and today this is still missing. What are the effects of new systems of peer review and reviewless publishing on the timeliness of scientific publishing and the quality of the papers that are published? Our experience at Frontiers is that impact-neutral, collaborative review allows our journals to achieve high impact factors, while publishing large numbers of high-quality papers, facilitating the publication of papers that would be hard to publish in traditional journals and drastically reducing delays. But we need systemic comparative studies to substantiate these claims.
Rigorous quantitative studies of the impact of new forms of review have been rare and non-conclusive. As a result, most of the arguments for and against have taken place in the blogosphere or through editorials and “opinion pieces”. This is not good enough. If the impact of peer review is as large as we suspect, the time is ripe for it to become an object of scientific study in its own right.

It is incontrovertible that classical peer review has done a poor job of identifying high-impact papers. Some journals, such as Nature and Science, are famous for review that is not blinded to “impact” of the papers. These journals have a measurably higher rate of retractions than other journals (Fang and Casadevall 2011).

Retraction index versus journal impact factor
Figure 1D from Brembs et al. 2013, showing the relationship between journal impact factor and a measure of the number of retractions relative to the number of papers ("Retraction index"). Original caption: "Linear regression with confidence intervals between IF and Fang and Casadevall's Retraction Index (data provided by Fang and Casadevall, 2011)."

High-impact-factor journals have been demonstrated to be more likely to publish papers with claims later prove to be false, in part due to the “decline effect”. Brembs and colleagues (2013) discuss some of the problems with assessing research as a function of journal rank:

As journal rank is also predictive of the incidence of fraud and misconduct in retracted publications, as opposed to other reasons for retraction (Steen, 2011a), it is not surprising that higher ranking journals are also more likely to publish fraudulent work than lower ranking journals (Fang et al., 2012). These data, however, cover only the small fraction of publications that have been retracted. More important is the large body of the literature that is not retracted and thus actively being used by the scientific community. There is evidence that unreliability is higher in high-ranking journals as well, also for non-retracted publications: A meta-analysis of genetic association studies provides evidence that the extent to which a study over-estimates the likely true effect size is positively correlated with the IF of the journal in which it is published (Figure 1C) (Munafò et al., 2009). Similar effects have been reported in the context of other research fields (Ioannidis, 2005a; Ioannidis and Panagiotou, 2011; Siontis et al., 2011).

But while many workers have demonstrated the unreliability of high-impact journals, these publications are still widely viewed as highly desirable on CVs and in grant applications. We can therefore predict that the conversation about the value and style of peer review is going to continue for a long time.

Both classical peer review and newer editorial approaches came under attack in anthropology in 2015. Some prominent paleoanthropologists have publicly questioned the quality of peer review underlying research results published in journals like Nature and Science, while others have questioned the peer review in open access journals like eLife.

In my view, there are two kinds of public comments about peer review. Informed scientists use an evidence-based approach to weigh the value of different peer review models. We can acknowledge that while no system-wide measure exists, each model has advantages and disadvantages to paleoanthropological research. In contrast, some senior scientists who complain about peer review are merely grandstanding: they snap to a “get off my lawn” attitude when they discuss new scientific results with the press. Nobody likes to feel like they’re out of the loop.

As scientists will continue express concerns about peer review in public comments, it is important for all of us to have accurate information about models of peer review that continue to gain support as a way to open the scientific process. New ideas are not always bad; some of them may be necessary to overcome the serious problems of classical peer review, especially in so-called “high-impact” journals. The Frontiers blog entry is a nice start. During the next few weeks I will be looking more closely at peer review in paleoanthropology to consider how we might identify successful approaches to improve the quality and accessibility of the science.

UPDATE (2016-01-05): A few readers have asked whether the correlation between retractions and impact factor might be explained by the hypothesis that high-impact-factor publications receive greater scrutiny after publication. Fang and Casadevall (2011) considered this as one of several possible explanations for the correlation:

The correlation between a journal's retraction index and its impact factor suggests that there may be systemic aspects of the scientific publication process that can affect the likelihood of retraction. When considering various explanations, it is important to note that the economics and sociology of the current scientific enterprise dictate that publication in high-impact journals can confer a disproportionate benefit to authors relative to publication of the same material in a journal with a lower impact factor. For example, publication in journals with high impact factors can be associated with improved job opportunities, grant success, peer recognition, and honorific rewards, despite widespread acknowledgment that impact factor is a flawed measure of scientific quality and importance (8, 29, 33, 77, 80, 86). Hence, one possibility is that fraud and scientific misconduct are higher in papers submitted and accepted to higher-impact journals. In this regard, the disproportionally high payoff associated with publishing in higher-impact journals could encourage risk-taking behavior by authors in study design, data presentation, data analysis, and interpretation that subsequently leads to the retraction of the work. Another possibility is that the desire of high-impact journals for clear and definitive reports may encourage authors to manipulate their data to meet this expectation. In contradistinction to the crisp, orderly results of a typical manuscript in a high-impact journal, the reality of everyday science is often a messy affair littered with nonreproducible experiments, outlier data points, unexplained results, and observations that fail to fit into a neat story. In such situations, desperate authors may be enticed to take short cuts, withhold data from the review process, overinterpret results, manipulate images, and engage in behavior ranging from questionable practices to outright fraud (26). Alternatively, publications in high-impact journals have increased visibility and may accordingly attract greater scrutiny that results in the discovery of problems eventually leading to retraction. It is possible that each of these explanations contributes to the correlation between retraction index and impact factor. Whatever the explanation, the phenomenon appears deserving of further study. The relationship between retraction index and impact factor is yet another reason to be wary of simple bibliometric measures of scientific performance, such as impact factor.

The scrutiny hypothesis was examined by Steen and colleagues (2013), who noted that high-impact journals have a shorter time-to-retraction, suggesting an effect of greater scrutiny. However, they found that this effect explains only around 1% of the time to retraction, consistent with the idea that “scrutiny is significantly related to the risk of retraction for misconduct but does not appear to be a major factor.”

References

Fang, F. C., & Casadevall, A. (2011). Retracted science and the retraction index. Infection and immunity, 79(10), 3855-3859. doi:10.1128/IAI.05661-11

Fang F. C., Steen R. G., Casadevall A. (2012). Misconduct accounts for the majority of retracted scientific publications. Proc. Natl. Acad. Sci. U.S.A. 109, 17028–17033 doi:10.1073/pnas.1212247109

Brembs, B., Button, K., & Munafò, M. (2013). Deep impact: unintended consequences of journal rank. Frontiers in human Neuroscience, 7: 291. doi:10.3389/fnhum.2013.00291

Munafò M. R., Freimer N. B., Ng W., Ophoff R., Veijola J., Miettunen J., et al. (2009). 5-HTTLPR genotype and anxiety-related personality traits: a meta-analysis and new data. Am. J. Med. Genet. B Neuropsychiat. Genet. 150B, 271–281 doi:10.1002/ajmg.b.30808

Steen RG, Casadevall A, Fang FC (2013) Why Has the Number of Scientific Retractions Increased? PLoS ONE 8(7): e68397. doi:10.1371/journal.pone.0068397