The next installment in my #draftInProgress series on Open Data citation.
I’m not sure this section will make it into the paper in its entirety, though I do think it is important to highlight the serious hurdles in getting access to data for research on research.
This step of the methods was certainly the most time-consuming part of the study!
Methods: citation data
This study required citation counts for thousands of articles identified through PubMed IDs. At the time of data collection, neither Thomson Reuter’s Web of Science nor Google Scholar supported this type of query. It was (and is) supported by Elsevier’s Scopus citation database. Alas, none of our affiliated institutions subscribed to Scopus. Scopus does not offer individual subscriptions, and a personal email to a Scopus Product Manager went unanswered.
One author (HAP) attempted to use the British Library’s walk-in access of Scopus on its Reading Room computers during a trip overseas. Unfortunately, the British Library did not permit any method of electronic transfer of our PubMed identifier list onto the Reading Room computers, including internet document access, transferring a text file from a USB drive, or using the help desk as an intermediary (see related policies). The Library was not willing to permit an exception in this case, and we were unwilling to manually type ten thousand PubMed identifiers into the Scopus search box in the Reading Room.
HAP eventually obtained Scopus access through a Research Worker agreement with Canada’s National Science Library (NRC-CISTI), after being fingerprinted to obtain a police clearance certificate (required because she’d recently lived in the USA for more than six months).
At the time of data collection the authors were not aware of any way to retrieve Scopus data through researcher-developed computer programs, so we queried and exported Scopus citation data manually through interaction with the Scopus website. The Scopus website had a limit to the length of query and the number of citations that could be exported at once. To work within these restrictions we concatenated up to 500 PubMed IDs at a time into 22 queries, where each query took the form “PMID(1234) OR PMID(5678) OR …”
Citation counts for
10694 papers were gathered from Scopus in November 2011.