July 2, 2012

Possible sources of an Open Data citation advantage

Evidence suggests that papers with available supporting data receive more citations than similar papers without publicly available data.

Assuming there is indeed a “citation boost” for making data available, what might cause these extra citations?  Data reuse attribution is the most obvious source, but there may be others.

The literature on the “Open Access Citation Benefit” articulates several possible sources of citation boost associated with Open Access to the literature, including Selection Bias and Early View [Craig 2007].

Inspired by these postulates, we suggest possible sources for an Open Data citation benefit:

  1. Data Reuse. Papers with available datasets can be used in more ways than papers without data, and therefore may receive additional attributions.
  2. Credibility Signalling. The credibility of research findings may be higher for research papers with available data. Such papers may be preferentially chosen for background citations and/or the foundation of additional research.
  3. Increased Visibility. Citing authors may be more likely to encounter a research project with available data. More artifacts associated with a research project gives the project a larger footprint, increasing the likelihood that someone finds an aspect of the research. Links from data to the research paper may also increase the search ranking of the research paper.
  4. Early View. When data is made available before a paper is published, some citations may accrue earlier than otherwise because research methods and findings are encountered prior to paper publication.
  5. Selection Bias. Authors may be more likely to publish data for papers they judge to be their best quality work, because they are most proud or confident in the results. ALTERNATIVELY, it is possible that author self-selection bias may have a negative correlation with research quality in the case of Open Data: authors may be less willing to share details for their most important and visible research in order to maintain a competitive edge and avoid the upheaval of error detection.
An open data citation boost likely results from a combination of these sources.
Are there others we are missing?

From a manuscript-in-progress with advisor and co-author Todd Vision.


