Research Remix

March 20, 2008

A review of journal policies for sharing research data

Filed under: MyResearch — Tags: , , , , — Heather Piwowar @ 1:00 pm

Inspired by the reception to this blog post, I systematically reviewed journal data sharing policies with gene expression microarray data as a use case. The brief and extended abstracts are below. Supplementary information is here. Full paper to be written prior to presentation in Toronto this June. I’m planning to finish writing the paper in the open, so I’d love to hear your comments.

ETA: Now up at Nature Precedings. ps mom ETA = edited to add

Piwowar HA, Chapman WW (2008) A review of journal policies for sharing research data. Accepted to ELPUB2008 (International Conference on Electronic Publishing): Open Scholarship: Authority, Community and Sustainability in the Age of Web 2.0

Background: Sharing data is a tenet of science, yet commonplace in only a few subdisciplines. Recognizing that a data sharing culture is unlikely to be achieved without policy guidance, some funders and journals have begun to request and require that investigators share their primary datasets with other researchers. The purpose of this study is to understand the current state of data sharing policies within journals, the features of journals which are associated with the strength of their data sharing policies, and whether the strength of data sharing policies impact the observed prevalence of data sharing.
Methods: We investigated these relationships with respect to gene expression microarray data in the journals that most often publish studies about this type of data. We measured data sharing prevalence as the proportion of papers with submission links from NCBI’s Gene Expression Omnibus (GEO) database.
We conducted univariate and linear multivariate regressions to understand the relationship between the strength of data sharing policy and journal impact factor, journal subdiscipline, journal publisher (academic societies vs. commercial), and publishing model (open vs. closed access).
Results: Of the 70 journal policies, 18 (26%) made no mention of sharing publication-related data within their Instruction to Author statements. Of the 42 (60%) policies with a data sharing policy applicable to microarrays, we classified 18 (26% of 70) as weak and 24 (34% of 70) as strong.
Existence of a data sharing policy was associated with the type of journal publisher: half of all commercial publishers had a policy compared to 82% of journals published by an academic society. All four of the open-access journals had a data sharing policy. Policy strength was associated with impact factor: the journals with no data sharing policy, a weak policy, and a strong policy had respective median impact factors of 3.6, 4.5, and 6.0. Policy strength was positively associated with measured data sharing submission into the GEO database: the journals with no data sharing policy, a weak policy, and a strong policy had median data sharing prevalence of 11%, 19%, and 29% respectively.
Conclusion: This review and analysis begins to quantify the relationship between journal policies and data sharing outcomes and thereby contributes to assessing the incentives and initiatives designed to facilitate widespread, responsible, effective data sharing.

Extended abstract:

Background
Sharing data is a tenet of science, yet commonplace in only a few subdisciplines. Recognizing that a data sharing culture is unlikely to be achieved without policy guidance, some funders and journals have begun to request and require that investigators share their primary datasets with other researchers. The purpose of this study is to understand the current state of data sharing policies within journals, the features of journals which are associated with the
strength of their data sharing policies, and whether the strength of data sharing policies impact the observed prevalence of data sharing.

Gene expression microarray data provides a useful environment for investigating these relationships, for several reasons. Despite being valuable for reuse and costly to collect, microarray data is often but not yet universally shared. An active Microarray and Gene Expression Data (MGED) Society has developed formatting and minimum-inclusion reporting standards and actively called for journal data sharing policies. A few centralized databases have emerged as best-practice repositories: the Gene Expression Omnibus (GEO) and ArrayExpress. Finally, the National Center for Biotechnology Information’s (NCBI) Entrez website makes it easy to identify journal articles with associated datasets within GEO, allowing us to study the association between journal policies and observed data sharing practice.

Methodology
We used Thomson’s Journal Citation Reports to identify journals with more than 15 articles published on “gene expression profiling” in 2006. We extracted the journal impact factors, subdiscipline categories, and publishing organizations. We consulted the Directory of Open Access Journals to identify which journals use an open-access publishing model. We downloaded the Instructions for Authors for each of these journals and reviewed them for any mention of data sharing. We classified the policies into three categories: no mention of sharing microarray data, a weak suggestion or requirement for sharing microarray data, and a strong and well-described requirement for sharing microarray data. We conducted univariate and linear multivariate regressions to understand the relationship between the strength of data sharing policy and journal impact factor, journal subdiscipline, journal publisher (academic societies vs. commercial), and publishing model (open vs. closed access).

Next, we developed a PubMed query for identifying journal articles which are likely to have generated gene expression microarray data in 2006-2007. Using the NCBI’s Entrez website, we collected the total number of such articles for each journal in our cohort and the percentage of articles with links to the GEO data repository. Finally, we conducted univariate and linear multivariate regressions over the journal data-sharing prevalence percentages to understand if strength of data sharing policy was associated with observed data sharing prevalence, independently of journal impact factor, subdiscipline, publisher type, and publishing model.

Data and interpretation
Seventy journals met the selection criteria, spanning a wide range of impact factors (0.9 to 30.0, median: 4.5). A minority were published by academic societies (22/70=31%). Only 4 (6%) use an open-access publishing model. Thomson’s Journal Citation Reports identified 27 subdisciplines covered by these journals; we retained the categories with more than five members: Biochemistry and Molecular Biology (n=19), Biotechnology and Applied Microbiology (11), Cell Biology (11), Genetics and Heredity (11), Oncology (19), and Plant Sciences (7). We also retained
Multidisciplinary Sciences (n=4).

Of the 70 journal policies, 18 (26%) made no mention of sharing publication-related data within their Instruction to Author statements. Another 11 policies (16%) included requests or requirements for sharing non-microarray types of data (usually DNA and protein sequences), but no statement covering data in general or microarray data in particular. Of the 42 journals (60%) with a data sharing policy applicable to microarrays, 24 (34% of 70) had a general statement about data sharing and 38 (54% of 70) covered microarrays explicitly. We classified 18 (26% of 70) of these 42 policies as weak and 24 (34% of 70) as strong.

Data sharing policy strength was associated with impact factor: the journals with no data sharing policy, a weak policy, and a strong policy had respective median impact factors of 3.6, 4.5, and 6.0. Data sharing policy was also associated with journal publisher: half of all commercial publishers had data sharing policy, whereas 82% of journals published by academic society had a data sharing policy. All four of the open-access journals had a data sharing policy. In multivariate analysis, impact factor and open access (positively) and Oncology (negatively) were statistically significantly associated with the existence of a microarray data sharing policy, with non-significant positive trends from academic society publishers and the Biochemistry&Molecular Biology subdiscipline. No multivariate associations were found to differentiate between a weak and a strong data sharing policy.

Data sharing policy strength was positively associated with data sharing prevalence in GEO: the journals with no data sharing policy, a weak policy, and a strong policy had median data sharing prevalences of 11%, 19%, and 29% respectively. Impact factor and academic society publishers were also associated with observed data sharing prevalence. We found that publisher type and policy strength interacted in multivariate analysis: policy strength increased with data sharing prevalence for journals published by commercial publishers, but not for academic society journals.

Conclusions
Our review found policies covering data sharing requirements vary widely between scientific journals, even for a specific data type with well-defined reporting standards and centralized repositories. A surprising number of journals had a policy for microarray data but not data in general, suggesting that our results may differ for datatypes for which the sharing infrastructure is not as mature. Data sharing prevalence was quite low, even for journals with very strict sharing requirements; further investigation is needed to understand this finding. Exploring the relationship between publisher type, subdiscipline, and data sharing prevalence is also worthy of additional work, as it may help to illuminate the features of research culture which affect data sharing principles independently of mandates.

This study has several limitations: it explores a limited number of journal policies for only one type of data, the measured data sharing behavior predates the policy downloads and so may not be reflective of current behavior, the method of measuring data sharing behavior captures most but not all articles that shared data, and the policy classifications were performed by only one investigator. We also note that the reported associations do not imply causation: future study is required to determine the degree to which a change in a journal’s data sharing policy will change the behavior of their authors.

Nonetheless, we believe this review and analysis is an important step in understanding the relationship between journal policies and data sharing outcomes. Policies are implemented with the hopes of affecting change, but it is often said, “You cannot manage what you do not measure.” We need to understand and quantify various incentives and initiatives if we hope to unleash the benefits of widespread data sharing.

6 Comments

  1. So unless they explicitly say that they’ll share microarray data, that means that they won’t?

    Comment by ecoli — March 20, 2008 @ 10:01 pm

  2. Hmmmm… the journal? Unlike the NIH OA requirement where some journals put the articles into PubMedCentral for the authors, journals don’t deposit data for the authors but rather request/require that the authors do it themselves (and sometimes require an accession number within the abstract as “proof”).

    Thanks for the question, it helps me understand what isn’t clear. Hope that helps. If not, please ask again!

    Comment by Heather Piwowar — March 21, 2008 @ 7:22 am

  3. oh… so it’s the journals that are to blame, not the authors. Well that’s not entirely surprising, I suppose.

    Comment by ecoli — March 21, 2008 @ 9:34 am

  4. Yeah, I don’t see it as blame so much, because “best practices” aren’t well communicated at this point so I think the behaviour often stems from ignorance/oversight.

    Also, for what it’s worth, I think the responsibility for sharing lies with authors and investigators. Journals, funders, and institutions can (and I think should, because it is also in their best interests) encourage and require behaviour which promotes quality science and access to science results, but doing it is part of the job description of a Scientist.

    Comment by Heather Piwowar — March 21, 2008 @ 9:51 am

  5. […] A review of journal policies for sharing research data Posted on March 26, 2008 by lilyheart https://researchremix.wordpress.com/2008/03/20/a-review-of-journal-policies-for-sharing-research-data… […]

    Pingback by A review of journal policies for sharing research data « University of Melbourne Library Intelligencer — March 25, 2008 @ 10:52 pm

  6. […] are also reconsidering their own policies with respect to data, becoming more and more explicit about what is expected from […]

    Pingback by thoughts on where journals are now, what to do next « Research Remix — December 3, 2011 @ 2:59 pm


RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Blog at WordPress.com.

%d bloggers like this: