Research Remix

December 15, 2011

Computing availability of full text for reuse

Filed under: Uncategorized — Heather Piwowar @ 7:10 am

Rough estimate:

PubMed lists 804184 publications from 2009 with links to full text.  Of these, 247421 (31%) have free full text, available for public view.  Only a small subset, about 67000 (8% of all publications), are open access with full text that can be systematically downloaded and used for text mining.

I’ll show how I got these numbers for future reference:

First, get all publications from 2009 with links to full text using this query in PubMed:

“loattrfull text”[sb] AND (“2009″[PDAT] : “2009”[PDAT])
(direct url)  returns 804184 results

Next limit these to publications with links to *free* full text using this query in PubMed:

“loattrfree full text”[sb] (“2009″[PDAT] : “2009”[PDAT])
(direct url)  returns 247421 results

Finally we want to identify which of these are open access.  This is a bit tricky because as far as I know this filter is not available in PubMed.  It is, however, available in PubMed Central.  So:

  1. Start with this query in PubMed as above:
    “loattrfull text”[sb] AND (“2009″[PDAT] : “2009”[PDAT])
    (direct url)
  2. On the right is a menu that says “Filter your results” Under that one of the options is
    Links to PMC (158729)
    Click this.  This will show, in PubMed, all 158729 articles that have records in PMC.  (Note there are quite a few papers will free full text that aren’t in PubMed Central, comparing this number to 247421)
  3. Now we want to see these articles within the PMC interface rather than the PubMed interface.  To do this, have a look at the “Find related data” menu a bit lower down on the right.
    For Database, select PMC.
    In Option, select Free in PMC
    then click Find Items.
    This will show the same articles but within the PMC interface.  Or rather, it shows the first 10000 of the articles.
  4. Have a look at the right menu now.  It has a link that says
    Open Access (4209)
    That is how many of the 10000 articles are available as Open Access articles, as far as PubMed Central knows.
  5. To finish, we need to extrapolate 4209 back to the full set, because it only represents the first 10000 articles.  Assuming that the 158729 articles have the same breakdown of OA/non-OA (a safe assumption?  could definitely do a bit more digging to be sure), we estimate that (158729/10000)*4209=66809
    of the articles are available as Open Access.
Other filters can obviously be ANDed to each step to see this ratio in specific topic area.  (There were a bunch of such calculations done by others a few years ago but I can’t easily find them on the web now.  Anyone have related links?)

3 Comments

  1. Excellent. I knew you’d come up with a good method for assessing this.

    Trouble is most of my literature (palaeontology) isn’t in PMC, but that’s fine
    because WRT funder policies and the emerging horror-show that is “open access” vs full Open Access (as per BOAI)
    relevancy to Welcome Trust, NIH, NSF… is all about PMC deposition, so that’s fine :)
    This is an awesome contribution. Thanks!

    Now we just need to get some *done* about this. 31% free vs 8% full Open Access is not cool.

    Should taxpayers start asking why ~75% (1 – 8/31) of the ‘free’ articles in PMC are not fully Open Access, and thus not as much value for money…?
    We’re already beyond the point at which it’s humanly possible to human-read all relevant scholarly articles. Machine-based text-mining use will surely only grow in future. Thus full Open Access will become more and more valuable… if people use it. Freely available just isn’t enough.

    Comment by Ross Mounce (@rmounce) — December 15, 2011 @ 7:29 am

  2. This also correlates very nicely with the OA availability figures in: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0020961 and http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0011273

    Comment by Peter Binfield — December 15, 2011 @ 10:29 am

  3. […] Computing availability of full text for reuse […]

    Pingback by Around the Web: Some resources on the Panton Principles & open data : Confessions of a Science Librarian — April 16, 2012 @ 7:26 am


RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Blog at WordPress.com.