Research Remix

August 13, 2010

Supplementary materials is a stopgap for data archiving

Filed under: Policies — Tags: , , , , — Heather Piwowar @ 11:23 am

The Journal of Neuroscience has issued a new policy on supplementary materials:

Beginning November 1, 2010, The Journal of Neuroscience will no longer allow authors to include supplemental material when they submit new manuscripts and will no longer host supplemental material on its web site for those articles

I think this will benefit the reporting of methods and exploratory analyses. I am thrilled that citations will no longer be lost in supplementary materials, assuming the additional citations make it into the main references list rather than being omitted.

But what about data?

A journal’s supplementary material section is not a great place for data. Limitations include:

  • not good for data formatting and reporting standards
  • not good for discoverability
  • not good for truly permanent storage
  • not good for machine retrievability
  • not good for journals sticking to core competencies
  • not good for journal planning, efficiency
  • not good for free access (in subscription journals)
  • not good for open access (or at least conveying openness clearly)
  • not good for lots of other things that I don’t know about and publishers don’t know about but repository professionals do know about

Most people would agree that well-designed, well-supported data repositories are the best place for data. The problem is, such repositories are few and far between. All is well and good if an experiment is in a discipline or produces a datatype for which a best-practice repository exists: the data should go there. All may be good if the authors are in an institution with an institutional repository that is well-equipped to handle scientific data, though these are uncommon. Otherwise where can investigators put their datasets?

Supplementary information is not a perfect home, it is not even very good, but it is better than hosting data on a lab websites or email-on-demand. It is a useful stopgap while more discipline-based repositories and institutional repositories rise to fill the need.

By removing this stopgap, in my opinion (and with the important caveat that I know very little about the journal or its discipline), The Journal of Neuroscience has sent three messages with its new policy:

1. They don’t consider archiving data to be their responsibility

This was already clear from their lackluster policy on data archiving:

Policy on Concerning Availability of Materials
It is understood that by publishing a paper in The Journal of Neuroscience the author(s) agree to make freely available to colleagues in academic research any clones of cells, nucleic acids, antibodies, etc. that were used in the research reported and that are not available from commercial suppliers.

Policy on DNA Sequences
[…] By the time a paper is sent to press, sequences must be deposited in a database generally accessible to the neuroscience community; the sequence accession number should be provided. Exceptions to this policy may be considered on an individual basis.

That’s it. Compare this to the comprehensive policies of other journals, particularly their statements of motivation. For example, in Science:

After publication, all data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science.

And in Stem Cells (similar in Cell):

Stem Cells supports the efforts of the National Academy of Sciences (NAS) to encourage the open sharing of publication-related data. Stem Cells adheres to the beliefs that authors should include in their publications the data, algorithms, or other information that is central or integral to the publication, or make it freely and readily accessible; use public repositories for data whenever possible; and make patented material available under a license for research use.

The Journal of Neuroscience has said that it wants to “maintain its leading position.” For what it is worth, evidence suggests that the highest impact journals have the strongest data sharing policies.

2. They don’t consider archiving data important

Based on the policy and the wording of its announcement, I was left with the impression that the Journal doesn’t consider data archiving important. In particular, stating that “supplemental material is inherently inessential” and “We should remember that neuroscience thrived for generations without any online supplemental material” belittles data sharing, given that much data is currently shared in supplementary materials for lack of a better place to put it.

The policy has left investigators with fewer better-than-nothing places to share data. I hope the next journal that is tempted to eliminate supplementary material will consider these alternative approaches to address its problems while supporting data archiving:

  • Fix rather than eliminate supplemental material policies: clearly specify that supplemental info is not peer-reviewed, specify that suppl info is only for data (for example), remind reviewers and authors that suppl info is not for defensive material, etc.

    One example is the thoughtful response by Cell to its problems with supplemental material, a solution of defining what should and shouldn’t be included:

    “One of the first issues we confronted in thinking about structuring supplemental material was one of setting limits. Limits of course have both positives and negatives. On the plus side, it seems in the best interest of everyone in the scientific community that the concept of a ‘‘publishable story’’ be at least roughly defined. […] strict overall length limits struck us as somewhat arbitrary, and we instead focused on a more conceptual organization.”

  • Or, if you do indeed want to eliminate supplementary materials, recommend and in fact require that links to supplementary information elsewhere are either to established repositories or to resources archived through one of the many mechanisms for url permanence.
  • Or, engage with Dryad or another discipline-based repository to find a win-win solution
  • And please commit to participating with the community to find solutions, rather vaguely suggesting, “It is conceivable that removing supplemental material from articles might motivate more scientific communities to create repositories for specific types of structured data, which are vastly superior to supplemental material as a mechanism for disseminating data.”

3. Change is needed

I completely agree with them here. Change is needed. I also applaud the Journal for taking a bold step, even if I disagree with its particulars. I think it will motivate, inspire, and induce change. Bring on the market disruption… although it is a real shame if we lose a bunch of (expensive) (irreplaceable) data (forever) in the process.


A follow-up post with references on supplementary material.

Other blogosphere commentary:

ETA: link to followup post

March 20, 2008

A review of journal policies for sharing research data

Filed under: MyResearch — Tags: , , , , — Heather Piwowar @ 1:00 pm

Inspired by the reception to this blog post, I systematically reviewed journal data sharing policies with gene expression microarray data as a use case. The brief and extended abstracts are below. Supplementary information is here. Full paper to be written prior to presentation in Toronto this June. I’m planning to finish writing the paper in the open, so I’d love to hear your comments.

ETA: Now up at Nature Precedings. ps mom ETA = edited to add

Piwowar HA, Chapman WW (2008) A review of journal policies for sharing research data. Accepted to ELPUB2008 (International Conference on Electronic Publishing): Open Scholarship: Authority, Community and Sustainability in the Age of Web 2.0

Background: Sharing data is a tenet of science, yet commonplace in only a few subdisciplines. Recognizing that a data sharing culture is unlikely to be achieved without policy guidance, some funders and journals have begun to request and require that investigators share their primary datasets with other researchers. The purpose of this study is to understand the current state of data sharing policies within journals, the features of journals which are associated with the strength of their data sharing policies, and whether the strength of data sharing policies impact the observed prevalence of data sharing.
Methods: We investigated these relationships with respect to gene expression microarray data in the journals that most often publish studies about this type of data. We measured data sharing prevalence as the proportion of papers with submission links from NCBI’s Gene Expression Omnibus (GEO) database.
We conducted univariate and linear multivariate regressions to understand the relationship between the strength of data sharing policy and journal impact factor, journal subdiscipline, journal publisher (academic societies vs. commercial), and publishing model (open vs. closed access).
Results: Of the 70 journal policies, 18 (26%) made no mention of sharing publication-related data within their Instruction to Author statements. Of the 42 (60%) policies with a data sharing policy applicable to microarrays, we classified 18 (26% of 70) as weak and 24 (34% of 70) as strong.
Existence of a data sharing policy was associated with the type of journal publisher: half of all commercial publishers had a policy compared to 82% of journals published by an academic society. All four of the open-access journals had a data sharing policy. Policy strength was associated with impact factor: the journals with no data sharing policy, a weak policy, and a strong policy had respective median impact factors of 3.6, 4.5, and 6.0. Policy strength was positively associated with measured data sharing submission into the GEO database: the journals with no data sharing policy, a weak policy, and a strong policy had median data sharing prevalence of 11%, 19%, and 29% respectively.
Conclusion: This review and analysis begins to quantify the relationship between journal policies and data sharing outcomes and thereby contributes to assessing the incentives and initiatives designed to facilitate widespread, responsible, effective data sharing.

Extended abstract:

(more…)

March 13, 2008

Support for data sharing in the NIH grant review process

Filed under: Uncategorized — Tags: , , , , , — Heather Piwowar @ 9:00 am

Join me in voicing support for including data sharing plans and track record as criteria in NIH grant review.  Currently the NIH requires a data sharing plan for large grants, but explicitly excludes reviewing this plan as part of assessing the scientific merit of a proposal.  This should change.

The email below was sent to the iscb-publicaffairs-updates list.  Thanks to Dr David States a) for championing these causes and b) drafting language for us to reuse in voicing our support.

Email your comments to PeerReviewRFI@mail.nih.gov by March 17, 2008 (or better yet, do it right now!)

While you’re at it, the ideas below about web resources sound great too.

The US NIH is in the final phase of revising the peer review process by
which grants are reviewed and awarded.  See
http://enhancing-peer-review.nih.gov/ for details and a link to the
report.  The Final Draft Report identifies “the most significant
challenges facing the NIH peer review system” and proposes recommended actions.
[..]

As the new chair of the ISCB Public Affairs & Policies Committee, there
are two specific issues that I would like to bring to your attention in
the proposed revisions to the NIH peer review process that are of direct impact to computational biologists:

1) Access to URLs and web materials in the review process.  The current NIH guidelines discourage applicants from including URLs in proposals and discourage reviewers from accessing web sites in the review process.
The basis for this policy is two fold, most importantly protecting the
reviewers’ confidentiality but also that a proposal needs to stand on
its own.  In my view the latter consideration needs to be tempered by
the fact that reviewers look at publications from an applicant all the
time.  Looking at a web site is no different from looking at a paper to
clarify a concern.  This policy can work to the disadvantage of
bioinformatics projects where the project web site is an important
mechanism for dissemination and data sharing.

Proposal: NIH should establish an anonymous web proxy server for use by reviewers so that they can visit and evaluate web services and web content described in a proposal.  Applicants should be encouraged to include URLs as evidence of project performance.

2) Data sharing.  The sharing of data by an investigator should be
included as a review criteria and there should be a section of the
proposal addressing data sharing plans.  Data sharing is not an all or
nothing issue.  There are issues about how the data is made available,
documented and in the case of web servers what restrictions are placed
on access and downloads.  I think it is appropriate for the reviewers to
consider data sharing behavior in evaluating the merits of a proposal.

Proposal:  Data sharing behavior should be made an explicit review
criterion.  There should be a section in all proposals discussing data
sharing track record and plans.  For renewal applications, there should
be a section in the proposal with actual accessions and URLs for data
that has been deposited with repositories or made accessible on the web.

These are my personal opinions and not necessarily those of ISCB, but if you share my concerns and wish to use any of my text above in your own comments, please feel free to do so.

Sincerely,

David J. States, M.D., Ph.D.
Chair, ISCB Public Affairs & Policies Committee

Blog at WordPress.com.