Research Remix

August 13, 2010

Refs on supplementary material

Filed under: Uncategorized — Tags: , , , — Heather Piwowar @ 1:06 pm

Inspired by current blogosphere discussions, I’ve pulled together a list of articles and studies related to journal supplementary material. The bibliography is at the bottom, and the living collection is at Mendeley. Let me know in the comments if you have other favs?

Because a mere bibliography is only so useful without rolling up one’s sleeves, here are a few rough highlights.

Note I haven’t found many studies that investigate what is actually *in* supplementary materials, how often they are read and used, what they are used for, and other important and interesting questions.


“First, I despise the name. Supplementary implies something extra. …
it sends exactly the wrong message about our priorities. What typically gets put into S&M? The details of the experimental methods and often, especially for papers in genomics, tables and figures containing at least some of the primary data” (Wilke)

“Other journals have almost completely moved the Materials and Methods section from the main text to online supplements. These journals are conveying the message, however inadvertent, that the sine qua non of the scientific method, the Materials and Methods, is the least important part of a scientific publication” (Shriner)


“While the size of articles has grown gradually over the past decade, the supplemental material associated with a typical Journal article appears to be growing exponentially and is rapidly approaching the size of an article. The sheer volume of supplemental material is adversely affecting peer review.” (Maunsell)


“Like it or not, ranking of scientific achievement by citation-based methods is an important part of the scientific system, and journals should make all their citations accessible to those who need accurate numbers. The solution to this problem seems quite simple: the citations in the supplement have to be incorporated into the reference section of the main text by the authors.” (Seeber)

“Supplemental data can seldom be discovered except by manual examination of individual articles. A paywall often limits access. Publishers put few resources into maintaining supplemental data and may even fail to migrate data when journals change hands. ” (Vision)

Monetary cost

prices for supplementary info

“Amongst three of the journals we interviewed, these author charges for supplementary data files ranged from $100 to $300+.” (Beagrie)

cost of a discipline-based repository

“Estimates of the combined online and print publication costs of a single scientific article range from $2000 to $10,000 (King 2007). On the basis of projections for Dryad, the marginal cost of data publication would be only a small fraction (< 2 percent) of this sum, provided that the repository has sufficient volume (on the order of 104 new submissions annually)." (Vision)

"With low to moderate curation effort, initial projections of potential costs for Dryad lead to ballpark estimates of $200,000 or $320,000, respectively, assuming receipt of data from 5,000 or 10,000 papers per annum." (Beagrie)

"Given the budget estimates for volumes of 5,000 and 10,000 papers per year, Dryad’s per paper expenses were estimated to be $40 and $32, respectively." (Beagrie)

"For Dryad the value proposition is as follows: [..]
For publishers, Dryad frees journals from the responsibility and costs of publishing and maintaining supplemental data in perpetuity, and allows publishers to increase the benefits of their journals to the societies and the scientists they support;"(Beagrie)

Permanence of supplementary materials and alternative data and web archives

migration to current formats

“Unfortunately, .doc is particularly ill-suited for archival and online-publishing purposes. Whether a particular .doc file can be opened and printed successfully depends on the exact version ofMicrosoft Word installed, the version of the operating system installed, the printer installed, and the fonts installed. Furthermore, the details of the .doc format are secret and change from version to version. As a result, some of Nature’s readers will have problems opening and printing supplementary material. Moreover,we should expect that many of these documents will fail to open properly 10 to 20 years from now.” (Wilke)

supplemental information within journals

“For Method 1 we found that since 2001, only 71 – 92% of supplementary data were still accessible via the links provided, with 93% of these inaccessible links occurring where
supplementary data was not stored with the publishing journal. Of the manuscripts evaluated in Method 2, we found that only 83% of these links were available approximately a year after publication, with 55% of these inaccessible links were at locations outside the journal of publication” (Anderson)

supplemental information upon request

“One in four e-mail addresses becominginvalid within one year of publication is analarming rate of decay as it has an impacton the ability of scientists to communicate and exchange material.” (Wren)

supplemental information by url

“The most common reason for citing a URL was to provide additional information about a topic (54.1%) or to link to additional data or analyses (37.7%)” (Wren Johnson)

“Most authors (55.2%) agreed that the unavailable URL content was important to the publication, but few controlled UR Lavailability personally (5%) or with the help of others (employees, colleagues, and friends) (6.7%).” (Wren Johnson)

“Most authors (32 [51.6%] of 62) did not know whythe URL they cited was unavailable. However, consistent with previous findings, about 11% of URLs were misspelled in the final publication. Three (4.5%) indicated that the URLs became unavailable because of a lack of funding or support.” (Wren Johnson)

“Thirty percent of expired pages referenced in three of the highest impact-rated scientific journals in the United States ended in ‘‘.edu’’ (Dellavalle et al., 2003).”

“Here, we see that websites published at .edu addresses are the least stable. One possible explanation for this is that corresponding authors tend to be lab mentors, whereas creators of websites would likely be students and/or post-doctoral fellows, who would be more likely to leave.” (Wren 2008)

“A study of the reasons behind URL decay suggested that it is often outside the control of the original website creators (Wren et al., 2006b), suggesting that the best place for intervention would be at the time of publication.” (Wren 2008)

“only 5% of URLs cited more than twice have decayed versus 20% of URLs cited once or twice. The most common types of lost content were computer programs (43%), followed by scholarly content (38%) and databases (19%).” (Wren)

Plant Physiology expands on this theme: Links to web sites other than a permanent public repository are not an acceptable alternative because they are not permanent archives.” (Piwowar, ELPUB)

possible improvements

“However, the average lifespan of a Web site is far from sufficient to ensure reliable long-term availability. Because of the inconstant nature of URLs, neither publishers nor authors are able to guarantee the long-term accuracy or availability of digital information referenced in dermatology journals. Effective solutions will likely require a collaborative effort on the part of researchers, authors, and journal editors.” (Wren Johnson)

“Many high-impact journals do not provide in- structions for Internet citation formats (44%), nor do they provide recommendations to archive cited digital information (99%)” (Schilling)

“The basic changes would require simply scanning for URLs in a publication, automatically checking them for availability, creating a snap- shot of URL content at the time of publication, and permitting authors to update URLs on the journal website should they change.” (Wren 2008)

“Methods of preservation such as PURLs (Schafer et al., 2001) and WebCite (Eysenbach, 2006) have been developed but are apparently not in widespread use. “” (Wren 2008) see also Table 2 in (Wren Johnson)

Related readings

Live collection on Mendeley.

(arg I wish allowed Mendeley embedding!)

Anderson, Nicholas R, Peter Tarczy-Hornoch, and Roger E Bumgarner. 2006. On the persistence of supplementary resources in biomedical publications. BMC Bioinformatics 7: 260. doi:10.1186/1471-2105-7-260.

Anon. Dryad Sustainability Plan: Interview survey findings.

Ball, Catherine A., Gavin Sherlock, Helen Parkinson, Philippe Rocca-Sera, Catherine Brooksbank, Helen C. Causton, Duccio Cavalieri, et al. 2002. Submission of Microarray Data to Public Repositories. PLoS Biology 18, no. 22: 1409.

Beagrie, Neil, Lorraine Eakin-Richards, and Todd Vision. 2009. Business models and cost estimation: Dryad repository case study, no. 1.

Brown, C. 2007. The role of Web-based information in the scholarly communication of chemists: Citation and content analyses of American Chemical Society Journals. Journal of the American Society for Information Science and Technology 58, no. 13.

Cozzarelli, NR. 2004. UPSIDE: Uniform principle for sharing integral data and materials expeditiously. Proc Natl Acad Sci U S A 101, no. 11: 3721-2. .

Dellavalle, Robert P, Eric J Hester, Lauren F Heilig, Amanda L Drake, Jeff W Kuntzman, Marla Graber, and Lisa M Schilling. 2003. Going, going, gone: Lost Internet references. Science 302, no. 5646: 787-788. doi:10.1126/science.1088234.;302/5646/787.pdf.

Ducut, Erick, Fang Liu, and Paul Fontelo. 2008. An update on Uniform Resource Locator (URL) decay in MEDLINE abstracts and measures for its mitigation. BMC medical informatics and decision making 8: 23. doi:10.1186/1472-6947-8-23.

Evangelou, E, T Trikalinos, and J Ioannidis. 2005. Unavailability of online supplementary scientific information from articles published in major journals. FASEB J 19, no. 14: 1943-1944. .

Eysenbach, Gunther. 2006. Going, going, still there: using the WebCite service to permanently archive cited Web pages. AMIA Symposium 7, no. 5: 919.

Marcus, Emilie. 2009. Taming supplemental material. Immunity 31, no. 5: 691. doi:10.1016/j.immuni.2009.10.005.

Maunsell, John. 2010. Announcement Regarding Supplemental Material. The Journal of Neuroscience 30, no. 32: 10599.

McCarthy, John. 2009. Supplementary online material: potential and precautions. Augmentative and alternative communication (Baltimore, Md. : 1985) 25, no. 1: 4-6. doi:10.1080/07434610902744041.

Murray-Rust, P, J Mitchell, and H Rzepa. 2005. Communication and re-use of chemical information in bioscience. BMC Bioinformatics 6.

Piwowar, Heather, and Wendy Chapman. 2008. A review of journal policies for sharing research data. In ELPUB. doi:10.1038/npre.2008.1701.1.

SHRINER, DANIEL. 2008. Putting Materials and Methods in Their Place. Science 322, no. December: 1463-1466.

Santos, C, J Blake, and D States. 2005. Supplementary data need to be kept in public repositories. Nature 438, no. 7069: 738. .

Schilling, Lisa M, Desiree P Kelly, Amanda L Drake, Lauren F Heilig, Eric J Hester, and Robert P Dellavalle. 2004. Digital information archiving policies in high-impact medical and scientific periodicals. JAMA : the journal of the American Medical Association 292, no. 22: 2724-6. doi:10.1001/jama.292.22.2724.

Seeber, F. 2008. Citations in supplementary information are invisible. Nature 451, no. 7181: 887.

Vision, Todd J. 2010. Open Data and the Social Contract of Scientific Publishing. BioScience 60, no. 5: 330-331. doi:10.1525/bio.2010.60.5.2.

Wilke, Claus. 2004. Supplementary materials need the right format. Nature 430, no. 6997: 291.
Petsko, Gregorya. 2006. Let’s get our priorities straight. Genome Biology 7, no. 1: 101. doi:10.1186/gb-2006-7-1-101.

Wren, Jonathan D, Joe E Grissom, and Tyrrell Conway. 2006. E-mail decay rates among corresponding authors in MEDLINE. The ability to communicate with and request materials from authors is being eroded by the expiration of e-mail addresses. EMBO Rep 7, no. 2: 122-127. doi:10.1038/sj.embor.7400631.

Wren, Jonathan D, Kathryn R Johnson, David M Crockett, Lauren F Heilig, Lisa M Schilling, and Robert P Dellavalle. 2006. Uniform resource locator decay in dermatology journals: author attitudes and preservation practices. Archives of dermatology 142, no. 9: 1147-52. doi:10.1001/archderm.142.9.1147.

Wren, Jonathan D. 2008. URL decay in MEDLINE–a 4-year follow-up study. Bioinformatics 24, no. 11: 1381-1385. doi:10.1093/bioinformatics/btn127.

Supplementary materials is a stopgap for data archiving

Filed under: Policies — Tags: , , , , — Heather Piwowar @ 11:23 am

The Journal of Neuroscience has issued a new policy on supplementary materials:

Beginning November 1, 2010, The Journal of Neuroscience will no longer allow authors to include supplemental material when they submit new manuscripts and will no longer host supplemental material on its web site for those articles

I think this will benefit the reporting of methods and exploratory analyses. I am thrilled that citations will no longer be lost in supplementary materials, assuming the additional citations make it into the main references list rather than being omitted.

But what about data?

A journal’s supplementary material section is not a great place for data. Limitations include:

  • not good for data formatting and reporting standards
  • not good for discoverability
  • not good for truly permanent storage
  • not good for machine retrievability
  • not good for journals sticking to core competencies
  • not good for journal planning, efficiency
  • not good for free access (in subscription journals)
  • not good for open access (or at least conveying openness clearly)
  • not good for lots of other things that I don’t know about and publishers don’t know about but repository professionals do know about

Most people would agree that well-designed, well-supported data repositories are the best place for data. The problem is, such repositories are few and far between. All is well and good if an experiment is in a discipline or produces a datatype for which a best-practice repository exists: the data should go there. All may be good if the authors are in an institution with an institutional repository that is well-equipped to handle scientific data, though these are uncommon. Otherwise where can investigators put their datasets?

Supplementary information is not a perfect home, it is not even very good, but it is better than hosting data on a lab websites or email-on-demand. It is a useful stopgap while more discipline-based repositories and institutional repositories rise to fill the need.

By removing this stopgap, in my opinion (and with the important caveat that I know very little about the journal or its discipline), The Journal of Neuroscience has sent three messages with its new policy:

1. They don’t consider archiving data to be their responsibility

This was already clear from their lackluster policy on data archiving:

Policy on Concerning Availability of Materials
It is understood that by publishing a paper in The Journal of Neuroscience the author(s) agree to make freely available to colleagues in academic research any clones of cells, nucleic acids, antibodies, etc. that were used in the research reported and that are not available from commercial suppliers.

Policy on DNA Sequences
[…] By the time a paper is sent to press, sequences must be deposited in a database generally accessible to the neuroscience community; the sequence accession number should be provided. Exceptions to this policy may be considered on an individual basis.

That’s it. Compare this to the comprehensive policies of other journals, particularly their statements of motivation. For example, in Science:

After publication, all data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science.

And in Stem Cells (similar in Cell):

Stem Cells supports the efforts of the National Academy of Sciences (NAS) to encourage the open sharing of publication-related data. Stem Cells adheres to the beliefs that authors should include in their publications the data, algorithms, or other information that is central or integral to the publication, or make it freely and readily accessible; use public repositories for data whenever possible; and make patented material available under a license for research use.

The Journal of Neuroscience has said that it wants to “maintain its leading position.” For what it is worth, evidence suggests that the highest impact journals have the strongest data sharing policies.

2. They don’t consider archiving data important

Based on the policy and the wording of its announcement, I was left with the impression that the Journal doesn’t consider data archiving important. In particular, stating that “supplemental material is inherently inessential” and “We should remember that neuroscience thrived for generations without any online supplemental material” belittles data sharing, given that much data is currently shared in supplementary materials for lack of a better place to put it.

The policy has left investigators with fewer better-than-nothing places to share data. I hope the next journal that is tempted to eliminate supplementary material will consider these alternative approaches to address its problems while supporting data archiving:

  • Fix rather than eliminate supplemental material policies: clearly specify that supplemental info is not peer-reviewed, specify that suppl info is only for data (for example), remind reviewers and authors that suppl info is not for defensive material, etc.

    One example is the thoughtful response by Cell to its problems with supplemental material, a solution of defining what should and shouldn’t be included:

    “One of the first issues we confronted in thinking about structuring supplemental material was one of setting limits. Limits of course have both positives and negatives. On the plus side, it seems in the best interest of everyone in the scientific community that the concept of a ‘‘publishable story’’ be at least roughly defined. […] strict overall length limits struck us as somewhat arbitrary, and we instead focused on a more conceptual organization.”

  • Or, if you do indeed want to eliminate supplementary materials, recommend and in fact require that links to supplementary information elsewhere are either to established repositories or to resources archived through one of the many mechanisms for url permanence.
  • Or, engage with Dryad or another discipline-based repository to find a win-win solution
  • And please commit to participating with the community to find solutions, rather vaguely suggesting, “It is conceivable that removing supplemental material from articles might motivate more scientific communities to create repositories for specific types of structured data, which are vastly superior to supplemental material as a mechanism for disseminating data.”

3. Change is needed

I completely agree with them here. Change is needed. I also applaud the Journal for taking a bold step, even if I disagree with its particulars. I think it will motivate, inspire, and induce change. Bring on the market disruption… although it is a real shame if we lose a bunch of (expensive) (irreplaceable) data (forever) in the process.

A follow-up post with references on supplementary material.

Other blogosphere commentary:

ETA: link to followup post

Blog at