Research Remix

April 13, 2012

Elsevier responds to my text mining request

Filed under: Uncategorized — Heather Piwowar @ 1:03 am

UPDATE:  See follow-up post for new developments.  

Yesterday Elsevier responded to my text mining request.  David Tempest, Universal Access Team Leader, emailed a letter with proposed addendum licensing terms to me and my university librarian.  It has been clear to everyone that I am blogging these interactions — there was no request to keep the letter confidential, so I include it in full below.


  • the agreement would permit some types of text mining of subscribed Elsevier content for authorized users in my university — a win, given that standard publisher contracts explicitly forbid all text mining.
  • the agreement places full responsibility on my university itself to install and support “the text mining system”
  • the agreement forbids releasing “all or any portion of the Subscribed Products… to anyone other than an Authorized User and other than as publishing the text mining results via scholarly communication.”

Here is the full letter from Elsevier (PDF).

What does this mean?  [UPDATE:  See follow-up post for new developments.  Elsevier has allowed the text-mining uses described below, and lightweight solutions in some cases]

1.  After negotiation, Elsevier permits the results of text mining to be included in scholarly communication, but does not permit text-mining over its literature for citizen science or research tools.  I explicitly asked Elsevier about these use cases (twice) and they have excluded them from their proposed agreement.  I did not develop these use cases as gotcha questions — they were existing plans for my real research, and they need text mining access.

2.  This took a really long time.  Not long compared to what some researcher have gone through for text mining access, but long!  And this is only for one publisher.  I guess I’d have to go through this again and again with Wiley and Springer and Nature and AAAS and all subscription publishers if I want text mining access to all the literature already covered by subscription agreements?

3.  It isn’t clear that my university can or will agree to these terms. My university probably doesn’t have the resources to install and maintain a text mining system.  I’m just a short-term postdoc with no grant funding for this: I can’t help support this infrastructure.  A researcher-driven solution I could handle myself.  The problem is that lightweight solutions aren’t allowed when content must be treated as a protected resource.

So.  That’s where it stands.  My university is reading the letter and deciding what to do.

I thank Elsevier for engaging with me on this.  I believe we both approached this in good faith.  Although these new contract terms will hopefully be useful to me and other researchers at UBC, I’m disappointed: I was hoping Elsevier would take this opportunity to work with me to experiment with new ways to support researchers building on top of the scholarly literature.

In contrast: You want to text mine Open Access content?  No problem.  It just works.  Ross Mounce carries PLoS full text around on a USB stick.

Let’s move to that kind of a publishing model now, please?  The kind of publishing model where the interests of publishers and researchers and research progress are all aligned.

Some of the blogosphere reaction to my Part 1 post on this subject:


  1. “Ross Mounce carries PLoS full text around on a USB stick.”

    Yep, all true.

    PLoS (and other BOAI/BBB compliant Open Access publishers) make this easy and expressly legal, by making their papers available under a Creative Commons Attribution Licence (CC-BY; Which means we are free to download and redistribute these works as much as we want – which turns out to be very helpful.

    If you want all of PLoS yourself, may I recommend as one ways of fairly easily getting it all (there may be other ways?).

    A question:

    I’m not too familiar as to how to download Open Access papers en masse for data mining / browsing / whatever… is there an easy one click torrent for all OA science papers (if not, perhaps someone should make a legal one, split by journal or publisher perhaps)? The OA PMC subset perhaps? (but even then, PMC would be suboptimal for me, because much of the literature I’m interested-in sits outside of PMC).

    We researchers and citizen scientists can only learn/realize the benefits of text/data mining if we have relevant corpuses (corpora?) to play with. There’s plenty of OA material out there, but aside from BioTorrents, I’m unsure how to get this content easily onto my computer…

    Comment by Ross Mounce (@rmounce) — April 13, 2012 @ 6:17 am

    • And there you have it: a library and information science research project fully specified in a comment in a blog post. Now to add it to the list of things that I’m thinking about.

      Comment by djfiander — April 13, 2012 @ 6:25 am

      • David – if you want help, i’m in.

        Comment by amy — April 13, 2012 @ 10:58 am

      • Cool! I have no idea how to start. Trying to decide if it’s a tool-building project or a “how to” guide kind of thing.

        Comment by djfiander — April 16, 2012 @ 8:23 am

    • Thanks for reminding me to re-start by biotorrents seeding.

      Comment by mrgunn (@mrgunn) — April 13, 2012 @ 1:07 pm

  2. […] April 13, 2012: Elsevier Responds To My Text Mining Request […]

    Pingback by Elsevier agrees UBC researchers can text-mine for citizen science, research tools « Research Remix — April 17, 2012 @ 10:02 am

  3. […] dat Elsevier agrees UBC researchers can text-mine for citizen science, research tools nadat ze het antwoord van Elsevier op haar vraag aangaande text-mining had gepubliceerd Dan hebben we de linked data uitleg van Europeana en volgens Disruptive Library […]

    Pingback by Bibliotheken en het Digitale Leven in de derde April week van 2012 | Dee'tjes — April 21, 2012 @ 9:51 am

  4. […] of being obstructive about mining.  Even when they set out to help — and to give credit, they are making an effort — it’s of the form “We are keen to arrange a teleconference with you all to […]

    Pingback by How Elsevier can save itself, part 2: Medium « Sauropod Vertebra Picture of the Week #AcademicSpring — April 26, 2012 @ 3:34 pm

  5. […] story has been covered by blogs, The Chronicle of Higher Education, The Guardian, a SPARC interview, a Suber Open Access News […]

    Pingback by talking text mining with Elsevier « Research Remix — June 8, 2012 @ 9:24 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Blog at

%d bloggers like this: