Publishers make article text available under a variety of copyright terms. Data, however, are not copyrightable. So what are we allowed to do with them, these datums and datasets within and beside article text? It isn’t clear. Few publisher sites say. It matters. So let’s ask.
On behalf of the Open Knowledge Foundation and benefitting from very useful feedback from a number of colleagues, Peter Murray-Rust and I recently sent email to PLoS, BMC, and Nature, asking them to confirm the openness of their data. The email is below. A slightly different email was sent to Mendeley, asking whether their data is open. All email queries and responses can be browsed at the Is It Open Data website. Furthermore, you can feel free to initiate your own enquiry from there. (And we’d love volunteers to help tweak the code to make the enquiry site even more useful.)
Peter Murray-Rust will highlight the responses-to-date in the #solo10 Green Chain Reaction session at the Science Online London conference later this week.
While this effort won’t answer all surrounding questions, hopefully it will clarify a few policies, illuminate outstanding issues, and liberate some text and data mining efforts on the way.
Subject: Enquiry about data openness at [Publisher]
I’m a postdoc researcher with NESCent, studying scientific data sharing and reuse. I’m writing to you, with Peter Murray-Rust, on behalf of the Open Knowledge Foundation. The Open Knowledge Foundation (OKF) is a non-profit global organization dedicated to the creation, dissemination and labelling of Open Knowledge.
On behalf of the OKF, we are writing to a large number of science publishers to ask for confirmation of their policies with respect to data published within their journals.
There is now great public interest in the Open availability of scientific data for validating scientific findings, detecting fraud and exploring new hypotheses. It is generally accepted by publishers that data per se are not copyrightable: several statements by publisher associations have made this point explictly. The Association of Learned and Professional Society Publishers (ALPSP) and International Association of Scientific, Technical, & Medical Publishers (STM) issued a joint statement in 2006 recommending that “research data should be as widely available as possible.” (http://www.alpsp.org/ForceDownload.asp?id=129) The 2007 Brussels Declaration from the STM states in part:
“Raw research data should be made freely available to all researchers.
Publishers encourage the public posting of the raw data outputs of research.
Sets or sub-sets of data that are submitted with a paper to a journal should
wherever possible be made freely accessible to other scholars.”
Combined with the acceptance and increasingly widepread adoption of the Panton Principles (http://pantonprinciples.org/), it is now possible to articulate policies that are consistent with the publication and reuse of Open Data.
We would like to ask your for clarification on several points with respect to your journals. It will help everyone if your answers are clear so that users of your material can know what they may and may not do without requesting further permission.
1. May users extract raw data and metadata (contextual facts about data collection) from supplementary information published in your journal?
2. May users extract raw data and metadata from figures, tables, and text in the narrative of your published articles?
3. May users extract this information from freely available articles and supplementary information, as well as those that are available by subscription only? For the latter, users would obtain access through an existing subscription.
4. May the extracted data be used as Open Data [1,2] without discrimination against users, groups, or fields of endeavor?
5. May users expose the extracted data as Open Data [1,2], in a manner consistent with the Panton Principles (http://pantonprinciples.org/)? Specifically, may they expose the extracted data on the internet under a Public Domain, PDDL (http://www.opendatacommons.org/licenses/pddl/) or CC0 waiver (http://wiki.creativecommons.org/CC0)?
6. May users obtain articles and supplementary materials (other than audio and video) from your website via automated means for the purposes of extracting raw data, if it is done in a manner that does not place undue burden on your resources? Users would obtain access through an existing subscription where necessary.
7. Will you consider displaying the OKF’s “Open Data” button (http://opendefinition.org/button) as a means of clarifying to readers and users the Open parts of your material?
Our questions are being asked through the OKF’s IsItOpen(Data) service (http://www.isitopendata.org), which has been designed to clarify in what sense published and online datasets are actually open. IsItOpen(Data) saves everyone time by allowing a question to be asked just once and making the reply permanently visible in a high-profile site.
On behalf of the scientific community, thank you in advance for your response. The clear labelling of Openness will save scientists hundreds of years’ work per year in asking permission and speculating. Enabling open access data, both for use and reuse, will help to validate published findings, discourage fraud and misconduct, and explore new research areas. Your clear support for these principles will demonstrate the value you place on these activities and surely benefits science.
We look forward to hearing from you. Could you let us know the timeframe in which we might expect a response?
Heather Piwowar, firstname.lastname@example.org
Peter Murray-Rust, email@example.com
on behalf of the Open Knowledge Foundation, http://okfn.org/
Sent by “Is It Open Data?” http://isitopendata.org/ A service which helps scholars (and others) to request information about the status and licensing of data and content.
Disclaimer: This message and any reply that you make will be published on the internet for anyone to access and copy. For more information see:
ETA: Removed link to responses for now.