I had a phone call on Friday with my university librarian and six (!) Elsevier employees. We discussed Elsevier’s text mining policies and whether my needs for text mining access could be better facilitated. The call was very positive, and I choose to be optimistic that my research projects — and those of others like me — will be better able to leverage the scientific literature. (See the “What about everyone else” section below for action steps if you want better text mining access for your projects too.)
All parties on the phone call agreed that I could blog the discussion, so here it is. Of course this is my interpretation: that of other participants may be different.
Full disclosure: it is no secret that I’m strongly against many of Elsevier’s policies and business models. That said, I do believe that Elsevier adds value to the scientific literature. That this value has been paid for, to date, in a subscription model is something we can’t change: a lot of the scientific literature is under Elsevier’s control. Elsevier states that it supports and facilitates scientific progress: perhaps Elsevier is willing to facilitate as-needed use of papers for which it holds copyright, when such access is designed to be no threat to journal subscriptions — and clearly in the best in interest of scientific discovery and progress?
My goal is efficient and effective research progress.
How did this call come about?
This meeting is thanks to the wonders of twitter and participation+proactive engagement there by Alicia Wise (aka @wisealic). I commend her for engaging with us there. I was participating in a twitter conversation about the PubMed Open Access Subset, and a) observed how few Elsevier articles are in it, and b) suggested that Elsevier make its back issues available for text mining for the progress of science.
Alicia replied to me:
— Alicia Wise (@wisealic) February 21, 2012
Phone call participants
True to her word, Alicia got back to me promptly and facilitated a phone call that included:
Alicia Wise – Director Universal Access
David Tempest – Deputy Director, Universal Access
Chris Shillum – Vice President Product Management, Platform and Content
Allan Lu – Director, Product Management, ScienceDirect
Ale de Vries- Director, Platform Integration
Kortney Boak – Account Manager, Canada
Aleteia Greenwood – Head Librarian Science & Engineering, UBC Library
Heather Piwowar – Department of Zoology, UBC
I was surprised by the attendee list! Aleteia, the UBC librarian, is great, BTW. She came up to speed on this issue in no time flat. We called in from her office.
Background on my projects
Before the call I sent the participants a summary of my text mining projects because Alicia had indicated that Elsevier facilitates text mining on a project-by-project basis. (I happen to believe this approach leads to inefficiency, an under-appreciation of demand, and less scientific progress — but that is out of scope of the current discussion.)
Here’s the email I sent (overviews of these projects deleted below but included in link)
Thanks again for reaching out to support my text-mining needs, it is much appreciated.
Before our call on Friday I thought I’d briefly summarize a few of the text-mining projects I’m working on.
My hope is threefold:
- to inform our decisions on ways I may text-mine Elsevier-controlled content
- to provide additional case studies for you to understand all the ways researchers may want to use the literature
- to highlight for you the frustration that many scholars feel about accessing and USING the scientific literature to advance science. I’m very happy to be having these conversations, but also very aware I’m only having them now because I was lucky on twitter. Many other scholars would also like to have them but don’t know how.
ok. My projects :)
My research area is studying patterns in research data sharing and use.
Project 1: Tracking datasets from public repositories into the published literature.
I’d like to programmatically query Elsevier fulltext for 1000 accession number strings. For each query string I’d like to export the search result information (dois or IDs), analyze it, and make it available as open supplementary information.
Project 2: Classifying citations to identify those made in the context of dataset reuse
I’d like programmatic access to the full text of Elsevier papers that I know to have cited my dataset cohort, so that I can automatically extract relevant citation context. I’d like to make this information publicly available to citizen scientists and run text analysis algorithms on it.
Project 3: Providing evidence of data use to data creators
I’d like ongoing programmatic access to the full text of Elsevier papers to query for Research Object identifiers, so that we can display links to the search results in total-impact, aggregate them in reports, and release them openly.
I’ll close by thanking you again for this opportunity to talk. I do believe that Elsevier adds value. I also believe additional value can be added by others, for the benefit of science, when research publications are made available for the sort of reuse I outline above.
We had a respectful and productive conversation. I recapped my projects, Elsevier told me about their standard textmining contract clause, and we discussed next steps.
Alicia was very focused on learning about and working toward meeting the needs of my text mining projects, and those of other researchers at UBC. For example, there were a few moments when others tried to ask for details about which articles I needed textmining access to, in terms of years and subject areas. I tried to answer then asked “Why is it important?” (thanks Aleteia). Alicia was quick to agree, it wasn’t relevant, and we moved on.
We decided that:
- I could get text mining access for the purpose of my first project immediately, through Elsevier’s APIs
- others on the call would work toward text mining access for UBC as a whole soon, and sooner than the next contract renewal (2014 or 2015). No money was discussed, leading me to assume that there would be no charge.
- two of my text mining use cases require reuse rights that are outside the standard Elsevier agreement. We will continue working together to see what we can do. Alicia mentioned the citizen science project as a particularly interesting use case (those weren’t her exact words, but that was the sentiment I remember). I left the call believing there was a possibility that we would be able to work something out for all of the projects.
- Ale de Vries sent me email on the weekend with API keys, and followed up on Monday with helpful tips on how to use them for my specific use cases. Very helpful.
- I asked for the text of the standard reuse agreement. It was sent to me but I was asked not to share it publicly because “it is a legal element”
- David Tempest is now taking lead in place of Alicia Wise in moving forward with partnership with UBC
- David will be meeting with the Elsevier lawyer, Jan Bij de Weg, on Wednesday morning to check into licensing questions
- someone (I’m not sure who, I need to check) will take the next step on adding text mining agreements into UBC’s Science Direct contract (UBC does not sign its own SD license, it is signed by the National Consortium, CRKN).
- I sent more details on my two use cases that are not clearly within the reuse terms of Elsevier’s text mining agreement:
Thanks again for the productive conversation on Friday.
As promised, here are details on two ways I’d like to reuse Elsevier content that fall out of your normal terms of reuse for textmined results:
1. Determining citation context through Citizen Science and text mining
I have a list of 792 PubMed IDs of studies that create a certain data type. I propose to find all papers that cite these studies and annotate the relevant citations to determine if the citations were in the context of data reuse.
Determining citation context is error-prone through text mining alone: I plan to ask citizen scientists to help with these annotations. This will require making either the full text of these papers (ideally) or a paragraph around the citation itself (less ideal: more technically challenging, less context for annotators) available to citizen scientists, ie the public.
After this data collection and subsequent analysis, it would benefit research if I could make this research corpus available to other researchers. My normal mechanism of doing this would be to zip up the context paragraphs and annotations and deposit them in Dryad under a CC0 license.
2. Providing evidence of research object use through full-text query
Evidence of research object use is often captured in the full text or references section of published papers. With other researchers, I’ve been working on non-profit project to identify and reveal evidence of research use. (My motivation is researcher incentives for data publication.)
We have had a previous conversation with Elsevier to define how we may integrate Scopus results into our tool, total-impact. In addition to the Elsevier-value-added Scopus results, there is a lot of value for science in pure full-text query results on research object identifiers. We are currently including PLoS full-text-query results in our reports; Elsevier content is missing.
I proposed to query Elsevier APIs for research object identifiers (dataset accession numbers, webpage urls, research paper titles, paper and dataset DOIs, etc) and reveal the number of hits as one of the metrics in a total-impact report, in raw form and in analyzed form in conjunction with other metrics. To facilitate drill-down (and with the added benefit of increased visibility to Elsevier and its journals), we’d link each aggregated count to a dynamically-generated webpage containing links to the journal-hosted full-text papers for each of the hits.
We believe that this sort of reuse information is crucial to incentives and science on how to do science better, so we make total-impact report information openly available through exports, embeds, and apis. We would like to include the Elsevier full-text-query-result metrics in these disseminations.
Please let me know if you have questions about the projects above, either in general or for the purpose of determining whether Elsevier can support these scientific initiatives through use of publications you host.
What About Everyone Else
At the end of the call, I stated that I’d like to blog the call… it was quickly agreed that was fine. Alicia mentioned her only hesitation was that she might be overwhelmed by requests from others who also want text mining access. Reasonable. We decided that in my blog I’d ask others interested in Elsevier text mining access to:
- make that request to their University Librarian
- suggest that their University Librarian discuss the request with their Elsevier rep
This seems like a great idea. Alone, however, it doesn’t make the demand for text mining visible. I said I’d create an open google doc to capture this demand and committed to keeping it constructive.
If you have a research project that is suffering from lack of text mining access to Elsevier content, please go add it here (constructively). Do it soon, because this could be important and useful information for the UK Hargreaves Report response on text mining (due March 21). Don’t forget to talk to your University Librarian too…. writing in the google doc won’t get you access.
so. It has been a positive conversation so far, and I choose to be optimistic that Elsevier will find ways to facilitate scientific progress with the papers on which it holds copyright. Where there is a will, there is a way.
I’ll keep you posted.
ETA: See subsequent post for the conclusion of this negotiation.
This story has been covered by blogs, The Chronicle of Higher Education, The Guardian, a SPARC interview, a Suber Open Access News feature and a Poynder summary and interview with Peter Murray-Rust. Each of these provides valuable and unique context: worth reading.