Research Remix

July 3, 2012

Citation11k: Method section — access to citation data #draftInProgress

Filed under: Uncategorized — Heather Piwowar @ 8:27 am

The next installment in my #draftInProgress series on Open Data citation.

I’m not sure this section will make it into the paper in its entirety, though I do think it is important to highlight the serious hurdles in getting access to data for research on research.

This step of the methods was certainly the most time-consuming part of the study!

Methods: citation data

This study required citation counts for thousands of articles identified through PubMed IDs. At the time of data collection, neither Thomson Reuter’s Web of Science nor Google Scholar supported this type of query. It was (and is) supported by Elsevier’s Scopus citation database. Alas, none of our affiliated institutions subscribed to Scopus. Scopus does not offer individual subscriptions, and a personal email to a Scopus Product Manager went unanswered.

One author (HAP) attempted to use the British Library’s walk-in access of Scopus on its Reading Room computers during a trip overseas. Unfortunately, the British Library did not permit any method of electronic transfer of our PubMed identifier list onto the Reading Room computers, including internet document access, transferring a text file from a USB drive, or using the help desk as an intermediary (see related policies). The Library was not willing to permit an exception in this case, and we were unwilling to manually type ten thousand PubMed identifiers into the Scopus search box in the Reading Room.

HAP eventually obtained Scopus access through a Research Worker agreement with Canada’s National Science Library (NRC-CISTI), after being fingerprinted to obtain a police clearance certificate (required because she’d recently lived in the USA for more than six months).

At the time of data collection the authors were not aware of any way to retrieve Scopus data through researcher-developed computer programs, so we queried and exported Scopus citation data manually through interaction with the Scopus website. The Scopus website had a limit to the length of query and the number of citations that could be exported at once. To work within these restrictions we concatenated up to 500 PubMed IDs at a time into 22 queries, where each query took the form “PMID(1234) OR PMID(5678) OR …”

Citation counts for 10694 papers were gathered from Scopus in November 2011.


  1. The information about the effort needed to gather citation counts is important. Depending on the overall objective of the research and this report the details of the work effort might better be in a footnote or end-note. In my experience it is easy to confuse the reader about what are the most important results.

    Comment by William L. Anderson (@band) — July 3, 2012 @ 8:50 am

    • Thanks for this comment, Bill. Will think about your advice about how to include it such that it doesn’t break the flow of the results themselves.

      Comment by Heather Piwowar — July 9, 2012 @ 4:16 am

  2. […] Method Section — Access To Citation Data […]

    Pingback by Data Reuse and the Open Data Advantage, a work in progress « Dee'tjes — July 5, 2012 @ 12:12 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Blog at

%d bloggers like this: