May 22, 2007

Nonresponse to data sharing requests

A few years ago, as I expressed frustration due to lack of a reply from a corresponding author, a professor summarized his experience: one third of authors do not reply when contacted, one third reply but are not able or willing to supply requested data, and one third reply and do supply the information.

I’ve since run across two published reports which quantify the nonresponse to data sharing requests. Does anyone have others?

As reported in a Nature editorial:
[Nature 444, 653-654 (7 December 2006) | doi:10.1038/444653b; Published online 6 December 2006]

The need for more data sharing has just been amply demonstrated by Jelte Wicherts, a psychologist specializing in research methods at the University of Amsterdam, who tried to check out the robustness of statistical analyses in papers published in top psychology journals.

He selected the November and December 2004 issues of four journals published by the American Psychological Association (APA), which requires its authors to agree to share their data with other researchers after publication. In June 2005, Wicherts wrote to each corresponding author requesting data, in full confidence, for simple reanalysis. Six months and several hundred e-mails later, he abandoned the mission, having received only a quarter of the data sets. He reported his failure in an APA journal in October (J. M. Wicherts et al. Am. Psychol. 61, 726–728; 2006).

The abstract of the original article:
[Wicherts JM et al. The poor availability of psychological research data for reanalysis. Am. Psychol. 61, 726–728; 2006]

The origin of the present comment lies in a failed attempt to obtain, through e-mailed requests, data reported in 141 empirical articles recently published by the American Psychological Association (APA). Our original aim was to reanalyze these data sets to assess the robustness of the research findings to outliers. We never got that far. In June 2005, we contacted the corresponding author of every article that appeared in the last two 2004 issues of four major APA journals. Because their articles had been published in APA journals, we were certain that all of the authors had signed the APA Certification of Compliance With APA Ethical Principles, which includes the principle on sharing data for reanalysis. Unfortunately, 6 months later, after writing more than 400 e-mails–and sending some corresponding authors detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes-we ended up with a meager 38 positive reactions and the actual data sets from 64 studies (25.7% of the total number of 249 data sets). This means that 73% of the authors did not share their data.

The second example (also referenced in a prominent editorial) is slightly more positive, but still disappointing.
[Kyzas PA, Loizou KT, Ioannidis JPA. Selective reporting biases in cancer prognostic factor studies. J Natl Cancer Inst 2005;97:1043–55]
[L. M. McShane, D. G. Altman, and W. Sauerbrei. Identification of Clinically Useful Cancer Prognostic Factors: What Are We Missing? J Natl Cancer Inst, July 20, 2005; 97(14): 1023 – 1025.]

…when a report suggested that mortality data had been collected, but no usable data were available in the publication, we communicated with the primary investigators. When there was no response within 2 months, a second communication attempt was made.
…For 22 of 64 studies, even though we contacted their primary investigators, we could not retrieve any additional data. Seventeen of the primary investigators did not reply at all; and five responded and stated that they were not able to retrieve the raw data.

One third, one quarter, two-thirds.

What a sorry state of affairs.
In some ways it is understandable. Sharing data is hard. People are busy.
But isn’t sharing data part of a scientist’s job description?



  1. Hi, Heather — you may be interested in Science Commons, which “serves the advancement of science by removing unnecessary legal and technical barriers to scientific collaboration and innovation.” Among other things, they’ve created a license so that it’s easier for scholars to share biological material (DNA, cell lines, etc.)

    Comment by Monica McCormick — May 23, 2007 @ 12:35 pm

  2. Hello Monica. Thanks for the pointer! I’d heard of Science Commons, but haven’t explored it yet, so I appreciate the reminder. I’ve added its blog to my daily read and I look forward to reading up on the projects.

    Take care,

    Comment by Heather Piwowar — May 25, 2007 @ 8:26 am

  3. Sharing data is hard, but it should be easier to share published papers, right? J Hartley published a few articles in the Journal of Information Science in 2004 in which he reported the results of sending e-mail requests for re-prints of journal articles and conference papers. I think he only got about half of the conference papers and maybe 80% of the journal articles. Not exactly the same thing, but somewhat related.

    Comment by Christina Pikas — May 26, 2007 @ 9:55 am

  4. […] Heather Piwowar @ 10:58 am Thanks to a pointer from Melissa Cragin in response to an earlier post, I’ve been reading the publications of Eric Campbell and his colleagues on data […]

    Pingback by Data Withholding research « Research Remix — May 30, 2007 @ 10:58 am

