PubMed lists 804184 publications from 2009 with links to full text. Of these, 247421 (31%) have free full text, available for public view. Only a small subset, about 67000 (8% of all publications), are open access with full text that can be systematically downloaded and used for text mining.
I’ll show how I got these numbers for future reference:
First, get all publications from 2009 with links to full text using this query in PubMed:
Next limit these to publications with links to *free* full text using this query in PubMed:
Finally we want to identify which of these are open access. This is a bit tricky because as far as I know this filter is not available in PubMed. It is, however, available in PubMed Central. So:
- Start with this query in PubMed as above:
“loattrfull text”[sb] AND (“2009″[PDAT] : “2009”[PDAT])
- On the right is a menu that says “Filter your results” Under that one of the options is
Links to PMC (158729)
Click this. This will show, in PubMed, all 158729 articles that have records in PMC. (Note there are quite a few papers will free full text that aren’t in PubMed Central, comparing this number to 247421)
- Now we want to see these articles within the PMC interface rather than the PubMed interface. To do this, have a look at the “Find related data” menu a bit lower down on the right.
For Database, select PMC.
In Option, select Free in PMC
then click Find Items.
This will show the same articles but within the PMC interface. Or rather, it shows the first 10000 of the articles.
- Have a look at the right menu now. It has a link that says
Open Access (4209)
That is how many of the 10000 articles are available as Open Access articles, as far as PubMed Central knows.
- To finish, we need to extrapolate 4209 back to the full set, because it only represents the first 10000 articles. Assuming that the 158729 articles have the same breakdown of OA/non-OA (a safe assumption? could definitely do a bit more digging to be sure), we estimate that (158729/10000)*4209=66809
of the articles are available as Open Access.