When scientific citations go rogue: Finding ‘sneaked references’

A classic but misleading image is a researcher working alone – apart from the world and the rest of the wider scientific community. Research is, in fact, built on constant exchange within the scientific community: First you understand the work of others, and then you share your findings.

An integral part of being a researcher is reading and writing articles that are published in academic journals and presented at conferences. When researchers write a scholarly article, they must cite the work of peers to provide context, detail sources of inspiration and explain differences in approach and results. Positive reference from other researchers is a key measure of the visibility of a researcher’s own work.

But what happens when this mobile system is manipulated? A recent Journal of the Association for Information Science and Technology article by our team of academic students – including information scientists, a computer scientist and a mathematician – revealed an insidious method of artificially increasing citation counts through metadata manipulations: sneaked references .

Hidden manipulation

People are becoming more aware of scientific publications and how they work, including their potential flaws. Just last year more than 10,000 scientific articles were retracted. The issues surrounding citation gambling and the damage it does to the scientific community, including damage to its credibility, are well documented.

References to scientific work adhere to a standardized reference system: Each reference explicitly mentions at least the title, names of authors, year of publication, name of journal or conference, and page numbers of the cited publication. This data is stored as metadata, which does not appear directly in the text of the article, but is assigned a digital object identifier, or DOI – a unique identifier for each scientific publication.

Citations in a scientific publication allow authors to justify methodological choices or present the results of previous studies, emphasizing the iterative and collaborative nature of science.

However, we discovered by chance that some unscrupulous actors added additional references, invisible in the text but present in the metadata of the articles, when they submitted the articles to scientific databases. The result? There has been a significant increase in the number of citations to certain researchers or journals, although the authors did not mention these references in their articles.

A chance discovery

The investigation began when Guillaume Cabanac, a professor at the University of Toulouse, wrote a post on PubPeer, a website dedicated to post-publication peer review, where scientists discuss and analyze publications. In the post, he detailed how he noticed a discrepancy: Hindawi’s journal article, which he suspected was fraudulent because it contained odd phrases, had far more citations than downloads, which is beyond his be unusual.

The post attracted the attention of several others who are now authors of the JASIST article. We used a scientific search engine to search for articles that cited the original article. Google Scholar found none, but Crossref and Dimensions did not find references. The difference? Google Scholar is likely to rely primarily on the main text of the article to extract the references in the bibliography section, but Crossref and Dimensions use metadata provided by publishers.

A new type of fraud

To understand the extent of the manipulation, we examined three scientific journals published by the Technoscience Academy, the publisher responsible for the articles containing questionable citations.

Our investigation involved three stages:

  1. We have listed the references that are explicitly present in the HTML or PDF versions of an article.

  2. We compared these lists with the metadata recorded by Crossref, discovering additional references that were added in the metadata but not visible in the articles.

  3. We checked Dimensions, a bibliometric platform that uses Crossref as a metadata source, finding additional inconsistencies.

In the journals published by Technoscience Academy, at least 9% of the recorded references were “sneaked references”. These additional references were only in the metadata, distorting the citation count and giving an unfair advantage to certain authors. Some legitimate references were also lost, meaning they were not present in the metadata.

In addition, while analyzing the sneaked references, we found that they were very beneficial for some researchers. For example, one researcher associated with Technoscience Academy benefited from more than 3,000 extra illegitimate citations. Several journals from the same publisher benefited from a few hundred additional sneaked citations.

We wanted our findings to be externally validated, so we posted our study as a preprint, shared our findings with both Crossref and Dimensions, and gave them a link to the preprint investigation. Dimensions acknowledged the illegitimate citations and confirmed that their database reflects Crossref’s data. Crossref also confirmed the additional references in Retraction Watch and indicated that this was the first time such a problem had been reported in its database. The publisher, based on Crossref’s investigation, has taken action to resolve the problem.

Implications and possible solutions

Why is this discovery important? Research funding, academic promotions and institutional rankings are heavily influenced by citation counts. Citation manipulation can lead to unfair decisions based on false data. Even worse, this discovery raises questions about the integrity of scientific systems for measuring impact, a concern researchers have highlighted for years. These systems can be manipulated to foster unhealthy competition among researchers, tempting them to take shortcuts to achieve faster publication or more citations.

To combat this practice we recommend several measures:

  • Strict verification of metadata by publishers and agencies such as Crossref.

  • Independent audits to ensure data reliability.

  • Increased transparency in managing references and citations.

This study is the first, to our knowledge, to report a metadata manipulation. It also discusses the impact this may have on the evaluation of researchers. The study shows, once again, that too much reliance on metrics to evaluate researchers, their work and impact can be flawed and wrong.

Such overreliance is likely to promote questionable research practices, including hypothesizing after the results are known, or HARKing; dividing one set of data into several papers, called salami slicing; data manipulation; and plagiarism. It also hinders the transparency that is essential for stronger and more efficient research. Although the problematic citation metadata and sneaked references have now apparently been fixed, the corrections may have come too late, as is often the case with scientific corrections.

This article is published in association with Binaire, a blog for understanding digital issues.

This article is republished from The Conversation, a non-profit, independent news organization that brings you facts and analysis to help you make sense of our complex world.

Written by: Lonni Besançon, Linköping University and Guillaume Cabanac, Institute of Research and Informatique de Toulouse.

Read more:

Lonni Besançon receives funding from the Marcus And Amalia Wallenberg foundation.

Guillaume Cabanac receives funding from the European Research Council (ERC) and the Institut Universitaire de France (IUF). He is the administrator of the Problematic Paper Screener, a public platform that uses metadata from Digital Science and PubPeer through no-cost agreements.

Thierry Viéville does not work for any company or organization that would benefit from this article, does not consult with, shares in or is funded by any company or organization that would benefit from this article, and does not disclose any relevant connections beyond their academic appointment.

Leave a Reply

Your email address will not be published. Required fields are marked *