talking text mining with Elsevier

March 5, 2012

talking text mining with Elsevier

Filed under: Uncategorized — Heather Piwowar @ 6:23 pm

I had a phone call on Friday with my university librarian and six (!) Elsevier employees. We discussed Elsevier’s text mining policies and whether my needs for text mining access could be better facilitated. The call was very positive, and I choose to be optimistic that my research projects — and those of others like me — will be better able to leverage the scientific literature. (See the “What about everyone else” section below for action steps if you want better text mining access for your projects too.)

All parties on the phone call agreed that I could blog the discussion, so here it is. Of course this is my interpretation: that of other participants may be different.

Full disclosure: it is no secret that I’m strongly against many of Elsevier’s policies and business models. That said, I do believe that Elsevier adds value to the scientific literature. That this value has been paid for, to date, in a subscription model is something we can’t change: a lot of the scientific literature is under Elsevier’s control. Elsevier states that it supports and facilitates scientific progress: perhaps Elsevier is willing to facilitate as-needed use of papers for which it holds copyright, when such access is designed to be no threat to journal subscriptions — and clearly in the best in interest of scientific discovery and progress?

My goal is efficient and effective research progress.

How did this call come about?

This meeting is thanks to the wonders of twitter and participation+proactive engagement there by Alicia Wise (aka @wisealic). I commend her for engaging with us there. I was participating in a twitter conversation about the PubMed Open Access Subset, and a) observed how few Elsevier articles are in it, and b) suggested that Elsevier make its back issues available for text mining for the progress of science.

Alicia replied to me:

@researchremix hi there – I am rather perplexed by this comment as all #Elsevier content – incl subscription content – can be text mined

— Alicia Wise (@wisealic) February 21, 2012

I responded with surprise because that wasn’t my understanding. We went traded a few tweets about Terms of Use and how it is currently unclear on the Elsevier website to understand what Terms appear to the reuse of article content, and she generously volunteered to follow up with me (and separately, Alf Eaton) to continue our conversation in email.

Phone call participants

True to her word, Alicia got back to me promptly and facilitated a phone call that included:

Alicia Wise – Director Universal Access
David Tempest – Deputy Director, Universal Access
Chris Shillum – Vice President Product Management, Platform and Content
Allan Lu – Director, Product Management, ScienceDirect
Ale de Vries- Director, Platform Integration
Kortney Boak – Account Manager, Canada
Aleteia Greenwood – Head Librarian Science & Engineering, UBC Library
Heather Piwowar – Department of Zoology, UBC

I was surprised by the attendee list! Aleteia, the UBC librarian, is great, BTW. She came up to speed on this issue in no time flat. We called in from her office.

Background on my projects

Before the call I sent the participants a summary of my text mining projects because Alicia had indicated that Elsevier facilitates text mining on a project-by-project basis. (I happen to believe this approach leads to inefficiency, an under-appreciation of demand, and less scientific progress — but that is out of scope of the current discussion.)

Here’s the email I sent (overviews of these projects deleted below but included in link)

Dear Alicia,

Thanks again for reaching out to support my text-mining needs, it is much appreciated.

Before our call on Friday I thought I’d briefly summarize a few of the text-mining projects I’m working on.

My hope is threefold:
– to inform our decisions on ways I may text-mine Elsevier-controlled content
– to provide additional case studies for you to understand all the ways researchers may want to use the literature
– to highlight for you the frustration that many scholars feel about accessing and USING the scientific literature to advance science. I’m very happy to be having these conversations, but also very aware I’m only having them now because I was lucky on twitter. Many other scholars would also like to have them but don’t know how.

ok. My projects :)

My research area is studying patterns in research data sharing and use.

Project 1: Tracking datasets from public repositories into the published literature.
..
I’d like to programmatically query Elsevier fulltext for 1000 accession number strings. For each query string I’d like to export the search result information (dois or IDs), analyze it, and make it available as open supplementary information.

Project 2: Classifying citations to identify those made in the context of dataset reuse
..
I’d like programmatic access to the full text of Elsevier papers that I know to have cited my dataset cohort, so that I can automatically extract relevant citation context. I’d like to make this information publicly available to citizen scientists and run text analysis algorithms on it.

Project 3: Providing evidence of data use to data creators
..
I’d like ongoing programmatic access to the full text of Elsevier papers to query for Research Object identifiers, so that we can display links to the search results in total-impact, aggregate them in reports, and release them openly.

I’ll close by thanking you again for this opportunity to talk. I do believe that Elsevier adds value. I also believe additional value can be added by others, for the benefit of science, when research publications are made available for the sort of reuse I outline above.

Sincerely,
Heather

Conversation

We had a respectful and productive conversation. I recapped my projects, Elsevier told me about their standard textmining contract clause, and we discussed next steps.

Alicia was very focused on learning about and working toward meeting the needs of my text mining projects, and those of other researchers at UBC. For example, there were a few moments when others tried to ask for details about which articles I needed textmining access to, in terms of years and subject areas. I tried to answer then asked “Why is it important?” (thanks Aleteia). Alicia was quick to agree, it wasn’t relevant, and we moved on.

We decided that:

I could get text mining access for the purpose of my first project immediately, through Elsevier’s APIs
others on the call would work toward text mining access for UBC as a whole soon, and sooner than the next contract renewal (2014 or 2015). No money was discussed, leading me to assume that there would be no charge.
two of my text mining use cases require reuse rights that are outside the standard Elsevier agreement. We will continue working together to see what we can do. Alicia mentioned the citizen science project as a particularly interesting use case (those weren’t her exact words, but that was the sentiment I remember). I left the call believing there was a possibility that we would be able to work something out for all of the projects.

Follow-up

Ale de Vries sent me email on the weekend with API keys, and followed up on Monday with helpful tips on how to use them for my specific use cases. Very helpful.
I asked for the text of the standard reuse agreement. It was sent to me but I was asked not to share it publicly because “it is a legal element”
David Tempest is now taking lead in place of Alicia Wise in moving forward with partnership with UBC
David will be meeting with the Elsevier lawyer, Jan Bij de Weg, on Wednesday morning to check into licensing questions
someone (I’m not sure who, I need to check) will take the next step on adding text mining agreements into UBC’s Science Direct contract (UBC does not sign its own SD license, it is signed by the National Consortium, CRKN).
I sent more details on my two use cases that are not clearly within the reuse terms of Elsevier’s text mining agreement:

Hi all,

Thanks again for the productive conversation on Friday.

As promised, here are details on two ways I’d like to reuse Elsevier content that fall out of your normal terms of reuse for textmined results:

1. Determining citation context through Citizen Science and text mining

I have a list of 792 PubMed IDs of studies that create a certain data type. I propose to find all papers that cite these studies and annotate the relevant citations to determine if the citations were in the context of data reuse.

Determining citation context is error-prone through text mining alone: I plan to ask citizen scientists to help with these annotations. This will require making either the full text of these papers (ideally) or a paragraph around the citation itself (less ideal: more technically challenging, less context for annotators) available to citizen scientists, ie the public.

I’d be happy to take steps to ensure the disclosure of these papers is kept to the purpose at hand: I could write a robots.txt file prohibiting spidering, add a Terms of Use policy to that effect, exclude the site from Google Search indexing, and not facilitate any kind of navigation or search-to-find-a-paper functionality on the site itself.

After this data collection and subsequent analysis, it would benefit research if I could make this research corpus available to other researchers. My normal mechanism of doing this would be to zip up the context paragraphs and annotations and deposit them in Dryad under a CC0 license.
2. Providing evidence of research object use through full-text query

Evidence of research object use is often captured in the full text or references section of published papers. With other researchers, I’ve been working on non-profit project to identify and reveal evidence of research use. (My motivation is researcher incentives for data publication.)

We have had a previous conversation with Elsevier to define how we may integrate Scopus results into our tool, total-impact. In addition to the Elsevier-value-added Scopus results, there is a lot of value for science in pure full-text query results on research object identifiers. We are currently including PLoS full-text-query results in our reports; Elsevier content is missing.

I proposed to query Elsevier APIs for research object identifiers (dataset accession numbers, webpage urls, research paper titles, paper and dataset DOIs, etc) and reveal the number of hits as one of the metrics in a total-impact report, in raw form and in analyzed form in conjunction with other metrics. To facilitate drill-down (and with the added benefit of increased visibility to Elsevier and its journals), we’d link each aggregated count to a dynamically-generated webpage containing links to the journal-hosted full-text papers for each of the hits.

We believe that this sort of reuse information is crucial to incentives and science on how to do science better, so we make total-impact report information openly available through exports, embeds, and apis. We would like to include the Elsevier full-text-query-result metrics in these disseminations.
Please let me know if you have questions about the projects above, either in general or for the purpose of determining whether Elsevier can support these scientific initiatives through use of publications you host.

Thanks!
Heather

What About Everyone Else

At the end of the call, I stated that I’d like to blog the call… it was quickly agreed that was fine. Alicia mentioned her only hesitation was that she might be overwhelmed by requests from others who also want text mining access. Reasonable. We decided that in my blog I’d ask others interested in Elsevier text mining access to:

make that request to their University Librarian
suggest that their University Librarian discuss the request with their Elsevier rep

This seems like a great idea. Alone, however, it doesn’t make the demand for text mining visible. I said I’d create an open google doc to capture this demand and committed to keeping it constructive.

If you have a research project that is suffering from lack of text mining access to Elsevier content, please go add it here (constructively). Do it soon, because this could be important and useful information for the UK Hargreaves Report response on text mining (due March 21). Don’t forget to talk to your University Librarian too…. writing in the google doc won’t get you access.

stay tuned

so. It has been a positive conversation so far, and I choose to be optimistic that Elsevier will find ways to facilitate scientific progress with the papers on which it holds copyright. Where there is a will, there is a way.

I’ll keep you posted.

ETA: See subsequent post for the conclusion of this negotiation.

This story has been covered by blogs, The Chronicle of Higher Education, The Guardian, a SPARC interview, a Suber Open Access News feature and a Poynder summary and interview with Peter Murray-Rust. Each of these provides valuable and unique context: worth reading.

Comments (24)

24 Comments

There is also a “call for research proposals” right now to access scopus and elsevier datasets, but which this year does not include full text access. http://ebrp.elsevier.com/index.asp

Comment by schulzjan — March 5, 2012 @ 10:37 pm
Thank you for your very clear account.

It worries me greatly. By dealing with Elsevier you have implicitly agreed that Elsevier has the right to control what you do. That they will then generously allow you a subset of the rights that they currently deny us. If all Universities follow the course of UBC we shall end up in a situation where Elsevier walled garden philosophy controls all of us.

We have a fundamental right to text-mine the literature. This agreement has given that up. I am sure it was well intentioned but that’s the effect.

The fundamental problem was that Librarians didn’t care about our rights and signed the Elsevier restrictive clauses without even raising the issue.

Comment by Peter Murray-Rust — March 6, 2012 @ 12:16 am
I have to say this disturbs me, too. Beginning with the fact that Elsevier’s Director of Universal Access didn’t even know that text-mining was restricted; encompassing all the wasted time and effort of these negotiations, lawyer consultations and so on; and landing up with the absurd notion that the text of the standard reuse agreement is a secret. I hardly have words for the stupidity of this.

It can’t possibly be an effective use of time for Elsevier’s people, let along yours. That tells me something even more disturbing: it’s not about even about money, it’s about control. And allowing a for-profit corporation control over science is simply not acceptable.

Yet worse: that the reuse agreement is secret means that authors who give their work to Elsevier do not even know what terms they’re doing it under. Their work might be made available for text-mining, or it might not — there is no way for them to know.

Really, I would be laughing at this absurdity if it wasn’t making me cry.

Comment by Mike Taylor — March 6, 2012 @ 1:37 am
- You know what, I think the last two commenters are missing the point, big time. Heather is going for evolutionary change in an existing system. More power to her elbow. Confrontation doesn’t always win (he said, confrontationally!).
  
  Researchers collectively have assigned their rights to most of their articles to giant corporations like Elsevier. You can change that going forward by declining to assign those rights (I already do decline). You can’t change it retrospectvely just by talking about “fundamental rights”. Heather is slightly loosening the grip, and helping to expose where it hasn’t loosened. Keep it up, Heather!
  
  Comment by Chris Rusbridge — March 6, 2012 @ 2:14 am
  - I’m not sure I can agree, Chris. It’s great that Heather has achieved what she has; but it’s been done at the cost of the most monumental waste of time and effort on her part, her librarian’s, and that of six (six!) Elsevier staffers. All this to make a use that can’t possibly harm Elsevier and if anything is likely to increase the visiblity of their work. And all that time spend on negotiating that could have been spent actually doing the work. Seems like a Pyrrhic victory to me.
    
    And as Heather points out, none of this by itself does anything to help the next author who wants to do something similar. That author will have to go through the exact same stupid process (assuming he or she can get access to the relevant people at all — not everyone will roll a six on Twitter like Heather did).
    
    You’re right, though, that by having signed away rights to Elsevier in the past we have created a big problem for ourselves. I think the only real solution are a proper show of goodwill from Elsevier (i.e. stating that all their papers are available to all subscribers for text-mining) or by transfer of ownership to a body that will do so.
    
    Comment by Mike Taylor — March 6, 2012 @ 2:23 am
[…] I read an article that I think was meant to be encouraging, but which instead I found disturbing. Talking text mining with Elsevier recounts Heather Piwowar’s recent experiences in trying to use Elsevier articles in her […]

Pingback by Winkling licence information out of Elsevier, bit bit bit « Sauropod Vertebra Picture of the Week #AcademicSpring — March 6, 2012 @ 2:46 am
I am on the side of Peter Murray-Rust and Mike Taylor. It’s absolutely not acceptable that scholars accept confidential agreements with Elsevier.

Comment by Klaus Graf — March 6, 2012 @ 3:34 am
I’m still not getting it. What are the choices Heather had?

a) not do research based on text mining because the agreement her university has entered into with Elsevier appears to prohibit it?

b) do the research without asking because others believe she has a fundamental right to do so, or because someone else believed this is permitted by Canadian law (as P M-R has pointed out, this can get her university disconnected from the Elsevier service, NOT a popular move)?

c) take advantage of an opportunity that has come her way, have the discussion, get some limited agreement, but not proceed because the legal text is claimed as confidential (note, the university’s main legal agreement with Elsevier is also almost certainly confidential; it’s how these pople work)? Or

d) as she has done, get some agreement, explot it, tell us as much as she can, and push for more?

I was Director of the UK Electronic Libraries Programme from 1995 to 2000. We funded lots of projects that pushed publishers in directions they were reluctant to go. Sometimes we had several projects in closely related areas, so different people were calling different representatives of different publishers, getting different answers and sharing the better approaches. We got digital publishing issues discussed at publisher board level. Not everything went the way we wanted, but IMHO we made progress that would not have been made otherwise. I think Heather is making progress that might not have been made with a more confrontational approach. And she’ll get part of her research project done, too.

What’s wrong with both approaches? P M-R and others go the confrontational way, Heather and others go the evolutionary way? Feels like a win to me!

Comment by Chris Rusbridge — March 6, 2012 @ 4:09 am
- Sorry, Chris, I evidently wasn’t being clear. I didn’t mean to suggest that Heather should have done anything different from what she did, but that (A) it’s stupid that she was required to, and (B) the information that’s come out of her going through this process has left me much more unhappy with the publisher than previously. No wonder they are so financially inefficient when it takes the Director of Universal Access, five other staff members and a lawyer to say “Yes, you can harvest that information, but we haven’t decided what you can do with the results”.
  
  Comment by Mike Taylor — March 6, 2012 @ 4:17 am
  - OK, sorry Mike, I misunderstood. Yes, I do absolutely agree with you, it is bonkers. In the end this is absolutely a must-win battle.
    
    Comment by Chris Rusbridge — March 6, 2012 @ 4:46 am
[…] problem of scale has also just played out in fact. Heather Piwowar writing yesterday describes a call with six Elsevier staffers to discuss her project and needs for text mining. […]

Pingback by Science in the Open » Blog Archive » They. Just. Don’t. Get. It… — March 6, 2012 @ 7:39 am
Way to go Heather. It’s good to keep the conversation going. It’s good for Elsevier to know what information science researchers really want to do with their content and what it is that their policies make it hard / difficult / impossible to do. I firmly believe that no individual at Elsevier is ill-willed on these issues. It’s a structural thing about how money flows in the publication industry. If data / text – miners who (for example) develop Sciverse apps (a) enhance Elsevier’s collection AND (b) make access to their collection more open – everybody wins.

Comment by Andre Vellino — March 6, 2012 @ 10:33 am
Another comment here, by Ben Goldacre: http://bengoldacre.posterous.com/academic-paywalls-dont-just-cost-money-they-h

Comment by Heather Piwowar — March 7, 2012 @ 5:12 am
University members of CRKN (Canadian Research Knowledge Network, the organization which signs contracts with Elsevier on behalf of UBC): http://www.crkn.ca/membership/members-by-province

Comment by Heather Piwowar — March 7, 2012 @ 9:09 am
[…] http://t.co/zvkxbtfM 2012-03-06RT @researchremix: New post: Talking text mining with Elsevier http://t.co/kSe9iryC 2012-03-06RT @totalimpactdev: feedback wanted: updated api spec http://t.co/kyBqBQPX […]

Pingback by Altmetrics, Total-Impact, etc. - Weekly Twitter Activity 2012-03-09 | Michael Habib | Nudging Serendipity — March 13, 2012 @ 3:33 am
[…] Talking text mining with Elsevier (Research Remix) […]

Pingback by Bibliotheken en het Digitale Leven in Maart 2012 | Dee'tjes — April 1, 2012 @ 4:43 am
[…] Elsevier responded to my text mining request. David Tempest, Universal Access Team Leader, emailed a letter with proposed addendum licensing […]

Pingback by Elsevier responds to my text mining request « Research Remix — April 13, 2012 @ 1:03 am
[…] March 5, 2012: Talking Text Mining With Elsevier […]

Pingback by Elsevier agrees UBC researchers can text-mine for citizen science, research tools « Research Remix — April 17, 2012 @ 10:02 am
[…] Elsevier (like many other publishers) has a history of being obstructive about mining. Even when they set out to help — and to give credit, they are making an […]

Pingback by How Elsevier can save itself, part 2: Medium « Sauropod Vertebra Picture of the Week #AcademicSpring — April 26, 2012 @ 3:34 pm
[…] who convened a conference call with Piwowar, a UBC librarian and five Elsevier colleagues. That conversation led to permission for UBC researchers to text mine the Elsevier journals to which they already had […]

Pingback by Text mining: what do publishers have against this hi-tech research tool? | Old News — May 23, 2012 @ 9:23 am
[…] who convened a conference call with Piwowar, a UBC librarian and five Elsevier colleagues. That conversation led to permission for UBC researchers to text mine the Elsevier journals to which they already had […]

Pingback by Text mining: what do publishers have against this hi-tech research tool? | Dani News — May 23, 2012 @ 10:32 am
[…] who convened a conference call with Piwowar, a UBC librarian and five Elsevier colleagues. That conversation led to permission for UBC researchers to text mine the Elsevier journals to which they already had […]

Pingback by Text mining: what do publishers have against this hi-tech research tool? « « News in BriefsNews in Briefs — May 23, 2012 @ 11:07 am
[…] who convened a conference call with Piwowar, a UBC librarian and five Elsevier colleagues. That conversation led to permission for UBC researchers to text mine the Elsevier journals to which they already had […]

Pingback by Text mining: what do publishers have against this hi-tech research tool? | Book Reviews and Ideas — May 24, 2012 @ 8:49 am
[…] who convened a conference call with Piwowar, a UBC librarian and five Elsevier colleagues. That conversation led to permission for UBC researchers to text mine the Elsevier journals to which they already had access.Piwowar said: "It takes a lot of time and a lot of energy and […]

Pingback by Text mining: what do publishers have against this hi-tech research tool? | Richard Hartley — August 4, 2012 @ 10:40 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

	It's time to in… on It’s time to insist on #…
	What we read this we… on It’s time to insist on #…
	Are Library Subscrip… on Where’s Waldo with Publi…
	Weekly digest: what’… on Where’s Waldo with Publi…
	Open access social s… on Where’s Waldo with Publi…

Research Remix

March 5, 2012