Research Remix

April 17, 2012

Elsevier agrees UBC researchers can text-mine for citizen science, research tools

Filed under: Uncategorized — Heather Piwowar @ 10:02 am

News!  Elsevier has agreed that that researchers at the University of British Columbia can text-mine Elsevier content for a wide variety of purposes, including:

  • direct analysis for research
  • selection of excerpts for citizen science, and
  • calculating statistics on the usage of research objects for open dissemination in research tools.

I believe this is an epic win.  Let me tell you why.

First, this agreement is out in the open.  Publishers have traditionally required that their contracts with libraries are secret: prices and terms.  When terms are open it means that other libraries can determine if they are getting a fair deal, researchers can know how publishers are facilitating/inhibiting reuse of their content, and we can all assess if a publisher’s behaviour matches its rhetoric.

Second, these terms are head and shoulders ahead of what standard contracts have allowed.  Want to know what standard contracts allow?  NO TEXT MINING AT ALL.  (excerpts collected in the face of secret agreements).  In my n=1 sample of negotiating for text-mining rights, the standard text-mining-is-allowed clause suggested by publishers does not allow text mining result data to be disseminated outside the university.  In contrast, the terms Elsevier is permitting in this agreement allow the sort of broad uses that are the future of research: combining text-mining with citizen science, using text-mining to power tools for researchers, open dissemination of aggregate results, and the like.

As such, the terms of this agreement should serve as a minimum template for what publishers offer (and subscribers insist upon) within standard subscription agreements going forward.  Libraries, you don’t know when your researchers are going to need this.  Get it for them now so they have it when they need it — negotiating when they need it is a serious delay to research.

Third, Elsevier is not charging UBC any more money for these terms.

Fourth, Elsevier has agreed that the text mining software can reside on computers of UBC researchers — rather than those within the university library IT system — when text mining is done in ways that does not create a large corpus of full text (for example text mining via api and on-demand processing).  This is empowering for researchers and avoids an unfunded burden for libraries.

Finally: are you convinced yet that blogging and tweeting about your research is totally worth it?  :)

I hasten to add that the agreement between UBC and Elsevier has not yet been signed off officially… Elsevier and I have agreed on the phone and via a brief email that they will extend these rights, but the language of the original letter needs to be amended to reflect these terms and agreed upon by UBC proper.  I’ll post when that is complete… I’ve been warned by UBC this sort of thing often takes at least 6 weeks.

The agreement here isn’t perfect: there are lots of things researchers might want to do that aren’t covered.  And frankly, although I’m happy to have these new terms and consider it progress, I don’t think this approach is the best one for research.   There are other approaches for establishing text mining access that move the power rather than just extracting better terms.  Peter Murray-Rust is asserting his rights to mine subscription content directly, giving publishers notice, and then just doing it.  Moves are afoot in the UK to reform copyright to explicitly allow text-mining.  An increasing number of researchers are choosing to publish in gold CC-BY open access journals: a solution that enables reuse by anyone for any purpose.  Policies that require libre open access for all publicly-funded research (after embargo) continue to gain momentum.  All of these solutions remove the need to ask publishers for permission — and really, doesn’t the idea of *asking publishers for permission to use research* grate?  it should.  I’m still boycotting with my research papers and reviewing hours.

That said, we are where we are, and we need to be moving the ball ahead in all ways.

That means all of us.  Copy the link to this post and email it to your university librarian.  Right now.  Ask him or her if your institution has text mining rights in its contracts with all its publishers…. and tell them that you want what UBC has  :)

History of the verbal agreement:

  • March 5, 2012: Talking Text Mining With Elsevier
  • April 13, 2012: Elsevier Responds To My Text Mining Request
  • April 13, 2012: I sent email to my Elsevier contacts alerting them to my blogpost and summarizing my disappointment that they chose to limit reuse dissemination to “scholarly communication.”  David Tempest wrote back immediately and said “I don’t know why you feel this is preventing you from doing your research – we developed this to allow you to continue.”   I responded in email:

This is the part of the letter that suggested to me you were restricting this agreement to the first of my three use cases:

UBC may not… “Make all or any portion of the Subscribed Products available to anyone other than an Authorized User and other than as publishing the text mining results via scholarly communication.”

I interpreted “via scholarly communication” to mean traditional publishing of my research results in blog posts and conferences and journals with reasonable-but-not-excessive amounts of supplementary data.

My second use cause involves making some of the research articles (or, instead if you request, excerpts of the research articles) identified through text mining available in a limited way to citizen scientists so they can help with semantic markup.  These citizen scientists wouldn’t be UBC authorized users, and I wouldn’t have expected this use case to be included in “via scholarly communication.”

My third use case involves disseminating text mining results within a research tool for use by researchers.  I wouldn’t have expected this to be considered “via scholarly communication” either.

Is your intent to facilitate all of these uses?  If so, fantastic!  Do you think the terms in the letter do in fact cover them right now?
  • We planned phone calls to continue the conversation.
  • April 16, 2012:  Quick phone call with David Tempest.  He asked for more detail on my third use case.  I explained the way we hope to use text-mining results within total-impact, including plans to disseminate the counts openly with snippets of context, with hyperlinks from the aggregate counts to the articles themselves on Elsevier’s own website.  He said he needed to check with lawyers and team about my third use case but was optimistic and would get back to me ASAP.
    I also asked if it was necessary that the text mining system be housed in the university library IT system as the letter implied.  David replied that was only necessary when the use involved establishing a large corpus of articles, not if I intended to use the API or process articles on the fly.
  • 30 minutes later: David wrote back and said “I have already spoken to my colleagues and we are happy for you to proceed as we discussed with the third element of your email.  No more issues to discuss, so please proceed!”

At this point I believe the written language of the agreement needs to be updated and clarified to reflect our verbal understanding.  I’m meeting with UBC librarians in person this week.

Edited to add: This story has been covered by blogsThe Chronicle of Higher EducationThe Guardian, a SPARC interview, a Suber Open Access News feature and a Poynder summary and interview with Peter Murray-Rust.  Each of these provides valuable and unique context: worth reading.


  1. Congratulations Heather, that’s great news!

    First they ignore you, then they laugh at you, then they fight you, then you win…

    Comment by Duncan — April 17, 2012 @ 12:14 pm

  2. […]  See follow-up post for new developments. […]

    Pingback by Elsevier responds to my text mining request « Research Remix — April 18, 2012 @ 6:51 am

  3. […] the ball rolling at your institution too. Share this:TwitterFacebookDiggRedditStumbleUponEmailPrintLike this:LikeBe the first to like […]

    Pingback by Care about data citation? Then you care about text-mining access. « Research Remix — April 19, 2012 @ 8:24 am

  4. […] who facilitate these terms now in subscription agreements with some institutions: […]

    Pingback by Do we need a text-mining manifesto? « Research Remix — April 19, 2012 @ 9:16 am

  5. […] now???) do not hold the cards.  We do.  Go out there and get thee some text-mining rights too (more).  Make all negotiations public.  Let’s do this. Share […]

    Pingback by text-mining is the new front, ready to escalate issues triggered by RWA debacle « Research Remix — April 20, 2012 @ 6:20 am

  6. […] Data Curation, Dans heeft een Online Data Portal in t leven geroepen en Research Mix meldt dat Elsevier agrees UBC researchers can text-mine for citizen science, research tools nadat ze het antwoord van Elsevier op haar vraag aangaande text-mining had gepubliceerd Dan hebben […]

    Pingback by Bibliotheken en het Digitale Leven in de derde April week van 2012 | Dee'tjes — April 21, 2012 @ 9:51 am

  7. This agreement has been summarized in a “Hot Type” article by Jennifer Howard in the Chronicle of Higher Education:

    Comment by Heather Piwowar — May 7, 2012 @ 9:17 am

  8. […] serves to distract from the debate about their pricing strategies. Heather has written about her experiences, and there is also a report in the […]

    Pingback by Elsevier is moving to open journals for text-mining | Wisdom's Quintessence — May 7, 2012 @ 8:59 pm

  9. I found some researchers at MPOW who would use such access and then contacted the person at the main campus who would negotiate this. Interestingly, she said we’ve always had this right we just have to let them know first!

    Comment by Christina Pikas — May 21, 2012 @ 8:52 am

  10. […] most influential living palaeoartist, Lipps has had a hugely distinguished career, and Piwowar is in the vanguard of the current efforts to mainstream the text-mining techniques that we can all see are the […]

    Pingback by Scopus is useless « Sauropod Vertebra Picture of the Week #AcademicSpring — May 29, 2012 @ 2:51 pm

  11. […] See subsequent post for the conclusion of this negotiation.  This story has been covered by The Chronicle of Higher […]

    Pingback by talking text mining with Elsevier « Research Remix — June 8, 2012 @ 9:17 am

  12. […] first perfected it on OA materials… but will they let us? Heather Piwowar’s experience earlier this year didn’t look too fun – and that was all for just one publisher. Phylogenetic research […]

    Pingback by Content mining for phylogenetic data - Ross Mounce — July 17, 2012 @ 7:59 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Blog at

%d bloggers like this: