Research Remix

March 5, 2013

Why Google isn’t good enough for academic search

Filed under: Uncategorized — Heather Piwowar @ 6:49 am

People often ask: why all the fuss about Search for academic papers?  Google does a fine job, we can find everything we need, what’s the problem?

I gave an answer to this in a comment on Mike Taylor’s blog and it got a bit of twitter pickup, so reposting my comment here for this audience.  Summary:  no one can build on the results!  

Google isn’t an acceptable answer to Searching across academic papers (toll access, green OA, gold OA, whatever) because it doesn’t support a way for people to digest the search results, add value, and apply the results in new and innovative ways. Google search results can only be used on Google’s website manually, or embedded as-is in other websites.

Neither Google nor Google Scholar offer an API — for love nor money, as far as I can tell, point me to it if I am wrong — that would let us do a Google Search and then sort/filter/enhance the results to add value and use in research and in scholarly tools.

Totally unacceptable as a search solution for the scholarly literature.  Think of the opportunity cost to research and research tools, and all the things that better research tools facilitate.

It doesn’t have to be this way.  Search results can be openly available for reuse (see the search APIs and API terms of use for PLOS, PMC, etc).

February 28, 2013

Do your review instructions ask if data+software are available?

Filed under: Uncategorized — Heather Piwowar @ 5:33 pm

It looks like PLOS Biology doesn’t ask reviewers to help uphold their data availability policies…. and I’m sure they aren’t the only journal missing this step.

I just send this email to PLOS Biology.  When you review a paper, check the material you are sent to see if you are asked to assess appropriate availability of materials, and if not (or not with sufficient emphasis) please make your voice heard.  You are welcome to use my dashed-off email as a template if it helps, needless to say.

Hi PLOS Biology,

I’m reviewing a paper for you now.  I’ve just realized that your email to reviewers contains several important prompt questions, but no prompts asking us whether data+software have been made appropriate publicly available, as per PLOS guidelines or community norms, whichever are stricter.

Sections 5 and 6 in your reviewer guidelines don’t cover this either… actually it doesn’t seem covered by your reviewer guidelines at all.

Your author instructions say:  “All appropriate datasets, images, and information should be deposited in public resources”… but there does not appear to be any reviewer check?

Seems a pretty big lost opportunity: reviewers are very well placed to make recommendations about what data should be made available.  The “detailed protocols” mentioned in your reviewer guidelines are unlikely to suggest datasets or software to most people.

Sincerely,

Heather
a big fan of data
Update:  a few days later, PLOS Biology responded as such (and gave me permission to post their response), with a CC to four internal employees:

Dear Heather

Many thanks for raising this issue with us. We are actually working on two fronts that will, we hope address your concerns in the near future (although not absolutely immediately). One is a general review of our policies, instructions and guidelines for the PLOS journals around data issues, and the other is improvements to the instructions and forms we use with reviewers. Both of these give us good opportunities to improve what we ask for and stipulate around data, which as you suggest is not yet optimal. It would be great if you let us know of other opportunities you think we’re missing, or any other suggestions you have in this area.

[..]

Theo

and I replied:

Thanks for the response, Theo!

This all sounds good, though I hope you don’t hold off on easy small improvements (adding a sentence or two to reviewer instructions to ask whether existing author instructions on data have been followed) until large changes are thoroughly designed and implemented.  [..]

Sincerely,
Heather
Summary:  journals want to hear from us.  It is definitely worth the time to raise these issues. Please write to your journals too!

January 16, 2013

ResearchFish: CVs with alternative products

Filed under: Uncategorized — Heather Piwowar @ 9:14 am

I received an email today and have been given permission to post it here to help spread the word.   See below (emphasis is mine).

Looks like ResearchFish is useful for funders and universities, and free for researchers to generate CVs that include alternative products.  Cool!  I do think the generated CV line items need some ImpactStory badges, what do you think?  :)

Dear Heather,

Please find enclosed a letter from Frances Buck, Director of Researchfish, in response to your article published in Nature on 10th January. We have submitted this to the Correspondence team.

With best wishes, Rebecca

 

Nature’s article, Altmetrics: Value all research products (493,159, 10 January 2013), by Heather Piwowar, suggests that funders are mainly interested in research papers when assessing grant applications.

While publications undoubtedly help demonstrate the impact and significance of research projects, it would be wrong to suggest that this is the only factor funders consider when reviewing a researcher’s contributions — indeed the MRC has been pivotal in promoting a broader approach.

Working in collaboration with Researchfish and five other major medical research charities in the UK, the MRC have developed a new online facility that enables researchers to comprehensively record the outcomes of their work. Funders can then review and easily evaluate this information. Researchfish’s portal is currently being used by 16 funding agencies, and over 6,500 PIs have already signed up, recording a wide range of products, from publications, to intellectual property and patient outcomes.

Frances Buck

Director Researchfish St John’s Innovation Centre Cowley Road Cambridge CB4 0WS, UK frances@researchfish.com

January 11, 2013

Process behind a Nature Comment

Filed under: Uncategorized — Heather Piwowar @ 8:55 am

Publishing a Comment in Nature involved a process unlike any I’ve experienced to date, so I figured I’d document it (Comment itself is here).   I wish more people would document the story behind their papers (and #OverlyHonestMethods :) ), and also the process behind their scientific communication to help us all peek behind the curtains.  Or, yknow, take down the curtains.

Invitation

I received an email from a Nature editor on November 1:

[..] I’m an editor in the Comment section at Nature, which features opinions by scientists. [..] I’m writing because a few issues have popped up that we thought you might have some insights on [..]

We’re interested in exploring a piece about the NSF’s decision to change “papers” to “projects” in scientists’ list of achievements. [..]

Does this topic spark any interest? If so, let’s chat – we’d want to time something to the first of the year, when the NSF change goes into effect. [..]

(I won’t name the editor because I don’t want to catch her unaware…. I’m not sure if it is appropriate to name her, so I’ll err on not.  She was very skilled and pleasant to work with, fwiw!).

Needless to say, the proposed topic is of interest to me and a Comment in Nature seemed like a great way to reach a broad and “traditional” audience with my thoughts on where this is going.  We set up a phone call for later in November.

Writing and Editing

The editor and I had a 15 minute call about my thoughts on the topic, and also about how Comments work.

I mentioned that I’d recently given a brief talk about implications of the NSF Biosketch policy change.  She suggested I send that along to her, and she’d reply with a paragraph-by-paragraph suggestion on how I compose a first draft of the comment.

The editor sent me a reply that had a surprisingly detailed outline:

Starting with the text you sent about your talk is great – it’s a good tone and level for our readership. We can just build on that. [..]

First paragraph: “hook” the reader. Like feature and news stories, or even editorials in a newspaper (which is really our model here), we need something that will “grab” the reader, make them want to [..]

Next 1-2 paragraphs: Describe the NSF change in policy, for readers who aren’t familiar with it [..]

Third paragraph: Present the crux of your argument: I think this change in NSF policy, along with other examples mentioned, indicate X [..]

Background, 2-3 paragraphs: Present examples of the changes [..]

Next 2-3 paragraphs: Explain more why these changes are so significant for science. Here is where you’ll put [..]

Final paragraphs: Here, we present “solutions.” How should things change further? What direction would [..]

Wow!  ok, sure, if that is how it works, I can do that.  So I pulled together a first draft, which I’ve posted here.  That’s when it got intense.  I’ve never had anything so heavily edited.  In addition to emailing drafts back and forth, we had two (or three? I forget) quick phone calls where the editor asked me clarification questions, then she’d send me another draft.  It took five revisions till it was time for her to pass it to her boss and the subeditors.

The subeditor was also great.  The subeditor sent me a revised version, and at this point it was layed out as a PDF.  I had a list of changes to maintain accuracy given the new edits.  There were about 3-4 more versions after this, with small changes.

Overall, I’d say this whole process made the resulting paper much more readable than it started.  It also changed the focus a bit, to having a stronger altmetrics focus, rather than being primarily about the alternative products.  I’m ok with that, though I do mourn some of the details in the original draft that didn’t make it into the final version.  I do kinda feel like the editor should be a coauthor, for what it is worth…. I think we’ve all had coauthors who did less than she did!  Feels a little strange that there is so much behind-the-scenes help in crafting these articles and that isn’t transparent at all.

One area I had no clear say was the title and subheading.  It went through 2-3 titles and 4-5 subheading phrases and locations in the versions I saw.  I did object to one of the versions (“creeping changes”), but in general it wasn’t clear that the title was my decision.  I didn’t know that the title in the HTML version and CrossRef was going to be prefixed with “Altmetrics:” because the title on the PDF copy I saw was simply titled “Value all research products”.  I’m a little unhappy about the leading “Altmetrics:” because I think it complicates the main thrust of the piece and makes it easy for people to get tangled, for example about whether blog posts are alt-products or sources of altmetrics (answer:both).  oh well, that’s ok: altmetrics is sexy, it makes sense to lead with it, and I’m certainly a big believer!

Timeline

Because the article was due out just after the holiday break, with a fixed publication date to coincide with the new policy implementation on Jan 14, the turn-around time I had for many of these revisions was very short (10 days for the initial draft, a few days for revisions, near the end less than a day for final revisions).  This was fine with me, I just note it so that others will know what you are getting into.

Copyright and Paywall

The other point I want to mention here is how Copyright works with Comments.   I admire Nature’s policy for copyright for research articles, given that they are a non open-access journal:  they do not require that authors sign away their copyright, instead they ask that authors grant Nature an exclusive license to publish.

Nature has a different policy for Comments.  You have to sign away your Copyright to Nature.  As a huge proponent of Open Access, I thought long and hard about whether I was ok with this.  I decided for this editorial content I was.  Happy to discuss :)

Here is the form that I signed. UK Comment CA  I upload it because I did not sign an NDA, and I know that I would have liked to find it online when I was first contacted by them to help me understand details of the agreement I’d be entering.

The first editor who contacted me knew that I am a strong supporter of OA.  Though she said that it would not be possible to make this comment OA, she said that we could nominate it to be one of the “free” articles.  I held fast to this, and in mid December requested with her that we do indeed make this request if she hadn’t already done so.  She was happy to do so, and asked me for a once sentence justification for why this paper should be freely available, because that is used by the group who makes the decisions.  Not sure I knocked this out of the park , but fwiw here’s what I sent:

People will likely circulate this article outside academia, since altmetrics is about valuing broad contributions to science, and broad interactions with science — high school viewers of wetlab YouTube videos, silicon valley dotcom contributors to science source code repositories, etc.

The good news is that they did decide to make my article free for “at least a week.”  It wasn’t free when it first went up, interestingly, but the paywall page stopped appearing within 12 hours.

One more thing for completeness.  I’ve heard some people are paid small amount for Comments?  I’m not sure if that is true or not.  In any event:  money was never mentioned to me, I wasn’t paid anything.

So there ya go.  Now you know everything I know about how Nature Comments work.

January 10, 2013

First draft of just-published Value all Research Products

Filed under: Uncategorized — Heather Piwowar @ 9:00 am

The copyright transfer agreement (arg) I signed for the Comment in Nature included restrictions on where I may post a copy of the article:

Although ownership of all Rights in the Contribution is transferred to NPG, NPG hereby grants to the Authors a licence […]
c) To post a copy of the Contribution as accepted for publication after peer review (in Word or Tex format) on the Authors’ own web site, or the Authors’ institutional repository, or the Authors’ funding body’s archive, six months after publication of the printed or online edition of the Journal, provided that they also link to the Journal article on NPG’s web site (eg through the DOI).

The article is available for free for a week or two on Nature’s site, and I’ll post the text here as soon as I can, six months from now.

In the meantime, as per contract lingo above, I may post the first draft that I sent the Nature editors.  So here is the first draft, for the benefit of those who are looking for a free version in the first half of 2013, and for anyone who cares to compare the first draft to the final draft :)   [Hint: there were MANY rounds of editing.  more on that in next post…. ]

NSF policy welcomes alt-products, increases need for altmetrics

(or perhaps NSF welcomes bragging about software, datasets in proposals)

Research datasets and software no longer have to masquerade as research papers to get respect.  Thanks to an imminent policy change at the NSF, non-traditional research products will soon be considered first-class scholarly products in their own right, and worth bragging about.  This policy change will prove a key incentive to produce and disseminate alternative products, and have far-reaching consequences in how we assess research impact.

Starting January 14th, the NSF will begin to ask Principal Investigators to list their research Products rather than Publications in the Biosketch section of funding proposals.  Datasets and software are explicitly mentioned as acceptable products in the new policy, on par with research articles.

The policy update reflects a general increase in attention to alternative forms of scholarly communication.  Policies, repositories, tools, and best practices are emerging to support an anticipated increase in dataset publication, spurred, in part, by now-required NSF data management plans.  Tools for literate programming, reproducible research, and workflow documentation continue to improve, highlighting the need for shared software.  Open peer review, online lab notebooks, post-publication discussion — as it gets easier to “publish” a wide variety of material online it becomes easy to recognize the breadth of our intellectual contributions.

I believe in the long run this policy change from Publications to Products will do much more than just reward an investigator who has authored a popular statistics package.  It is going to change the game, because it is going to change how we assess research impact.

The change starts by welcoming alternative products.  The new policy welcomes datasets, software, and other research output types in the same breath as publications: “Acceptable products must be citable and accessible including but not limited to publications, data sets, software, patents, and copyrights. Unacceptable products are unpublished documents not yet submitted for publication, invited lectures, and additional lists of products.”  In contrast, previous versions of the Biosketch instructions policy allowed fewer types of acceptable products (“Patents, copyrights and software systems”) and considered their inclusion to be a “substitution” of the main task of listing research paper publications.

The next step will become apparent when we consider what peer reviewers will want to know when they see these alternative products in a Biosketch.  What is this research product?  Is it any good?  What is the size and type of its contribution?  We often assess the quality and impact of a traditional research paper based on the reputation of the journal that published it.  In fact the UK Engineering and Physical Sciences Research Council makes this clear in its fellowship application instructions: “You should include a paragraph at the beginning of your publication list to indicate … Which journals and conferences are highly rated in your field, highlighting where they occur in your own list.”

Including alternative products will change this: it necessitates a move away from assessment based on journal title and impact factor ranking.  Data and software can’t be evaluated with a journal impact factor — repositories seldom select entries based on anticipated impact, they don’t have an impact factor, and we surely we don’t want to calculate one to propagate the poor practice of judging the impact of an item by the impact of its container.  For alternative products, Item level metrics are going to be key evidence for convincing grant reviewers that a product has made a difference.  The appropriate metrics will be more than just citations in research articles: because alternative products often make impact ways that aren’t fully captured by established attribution mechanisms, alternative metrics (altmetrics) will be useful to get a full picture of how research products have influenced conversation, thought, and behaviour.

The ball will bounce further.  Once altmetrics and item level metrics become expected evidence to help assess the impact of alternative products, the use of item-level altmetrics will bounce back to empower innovations in the publication of traditional research articles.  Starting a new or innovative journal is risky: many authors are hesitant to publish their best work somewhere unusual, somewhere without a sky-high impact factor.  When research is evaluated based on its individual post-publication reception, innovative journals become attractive, perhaps competitively more attractive than staid established run-of-the-mill alternatives.  Reward for innovative journals will result in more innovations in publishing.  Heady stuff!

A few large leaps are needed to realize this future, of course.  First, this one policy change hardly represents a consistent message across the NSF.  Accomplishment-Based Renewals are still based on “six reprints of publications”, with no mention of alternative products.  Even in the Grant Proposal Guide, the same document that houses the new Products policy, the instructions for the References Citations section are written as if only research articles would be cited in a grant proposal.  What about preliminary data on figshare, or supporting software on RunMyCode, or a BioStar Q&A solution, or a patent, or a blog post, or, for that matter, an insightful tweet?  If we think these products are potentially valuable, the NSF should welcome and encourage their citation anywhere it might be relevant.

The second hurdle is that a policy welcoming the recognition of alternative products is not yet common outside the NSF.  A brief investigation suggests that many other funders — including the NIH, HMMI, Sloan, and UK MRC– still explicitly ask for a list of research papers rather than products.  A few, like the Wellcome Trust and UK BBSRC just seem to ask broadly for a CV, leaving the decision about its contents to the investigator.  This could be good, but because investigators are not used to considering alternative products to be first-class citizens, explicit welcoming is important to drive change.

The third challenge between us and a new future brings us to an exciting area under active development.  When products without journal title touchpoints start appearing in BioSketches, how will reviewers know if they should be impressed?  Reviewers can (and should!) investigate each research product itself and evaluate it with their own domain expertise.  But what if an object is in an area outside their expertise?  They need a way to tap into the opinion of expert in that domain.  Furthermore, beyond the intrinsic quality of the work, how will reviewers know if the Intellectual Merit has indeed been impactful on scholarship and the world, and thus should lend credence to the proposal under consideration?

Many data and software repositories keep track of citations and download statistics.  Some repositories, like ICPSR, go a step further and provide anonymous demographic breakdowns of usage to help us move beyond “more is better” to an understanding of the flavour of the attention.  This context will become richer as more types of engagement are added:  is the dataset being bookmarked for future use?  Who is cloning and building on the open software code?  Are blog posts be written about the contribution?  Who is writing them and what do they say?

Tools are available today to collect and display this evidence of impact.  Thomson Reuter’s Data Citation Index aggregates citations to datasets that have been identified by data repositories.  Altmetric.com identifies blog posts, tweets, and mainstream media attention for datasets with a DOI or handle: try it out using their bookmarklet.  The nonprofit organization ImpactStory tracks the impact of datasets, software, and other products, including blog and twitter commentary, download statistics, and attribution in the full text of articles: give it a try.  I’m a cofounder of ImpactStory: we as scientists need to go beyond writing editorials on evaluation and actually start building the next generation of scholarly communication infrastructure.  We need to create business models for infrastructure that support open dissemination of actionable, accessible and auditable metrics for research and reuse.

Finally, the practice shift to value broad impact will be more rapid and smooth if funders and institutions explicitly welcome broad evidence of impact.  Principal investigators should be tasked with making the case that their research has been impactful.  Most funders, including the NSF, do not currently ask for evidence of impact.  This may be changing: the NIH issued an RFI earlier this year on BioSketch changes that would include documenting significance.  In the meantime, the lack of an explicit welcome hasn’t stopped cutting-edge investigators from augmenting their free-form CVs and annual reviews to mention that their work has been “highly accessed” or received a F1000 review.  This — and next generation evidence with context — should be explicitly welcomed.

Despite these hurdles, the future is not far away.  You and I can start now.  Create research products, publish them in their natural form without shoehorning everything to look like an article, make citation information clear, track impact, and highlight diverse contributions when we brag about our research.  We’re on our way to a more useful and nimble scholarly communication system.

Just published: Value all research products

Filed under: Uncategorized — Heather Piwowar @ 8:05 am

A Nature editor contacted me in November, asking if I’d like to write a Comment about the upcoming NSF policy change in Biosketch instructions.  It sounded like a great chance to talk about the value of alternative research products with a wide audience, so I agreed.  The comment was published yesterday and is now available here:

Piwowar H. (2013). Value all research products, Nature, 493 (7431) 159-159. DOI:

Because of Nature’s policies about copyright assignments for Comments, the comment is not open access and it is behind a paywall.  Arg.  That said, I requested that it be one of their “free” articles and they agreed, so it will be freely available at the above link for a week or two.  I will post the text up on my website as soon as I am able, 6 months from now.

Working on a blog post about the process behind the scenes, because it was certainly unlike anything else I’ve published to date!

Questions about the piece, or thoughts or opinions?  Welcome below, or on twitter to @researchremix.

July 16, 2012

Many datasets are reused, not just an elite few

Filed under: Uncategorized — Heather Piwowar @ 8:04 am

I’ve recently collected new data on data reuse.  Using the same methods as our Nature letter-to-the-editor analysis, I’ve looked for reuse of gene expression microarray data in PubMed Central by searching for dataset ID numbers in the full text of studies.  Studies that mention a dataset accession number but share author last names with those who deposited the dataset are excluded.

The new results look at datasets deposited into the Gene Expression Omnibus (GEO) repository between 2001 and 2009.

Results for the middle years are particularly important, since by then GEO had a lots of datasets, and between then and now there has been enough time for reuse to accumulate.  We observed reuse of more than 20% of the datasets deposited in 2003 and 17% of datasets deposited in 2007.

Note: the method used to detect reuse here is VERY CONSERVATIVE so these are minimum estimates.  It only finds reuses by papers that are in PubMed Central, and only those that are attributed by mentioning the accession number (it misses those attributed by citation to the article, for example).  Nonetheless, it does serve as a lower bound.

Analysis of the accession number mentions revealed that data reuse was driven by a broad base of datasets: about 20% of the datasets deposited between 2003 and 2007 have been reused by third parties. We note these proportions are gross underestimates since they only include reuses we observed as accession number mentions in PubMed Central; no attempt has been made to extrapolate these distribution statistics to all of PubMed, or to reflect attributions through citations. Further, many important instances of data reuse do not leave a trace in the published literature, such as those in education and training. Nonetheless, even these conservative estimates suggest that reuse finds value in a wide range of datasets, not simply a “very reusable” elite.

(manuscript-in-progress with co-author Todd Vision)

July 13, 2012

Concrete options for a society journal to go OA

Filed under: Uncategorized — Heather Piwowar @ 10:59 am

AMIA‘s society journal, JAMIA, is considering going Open Access. I’ve been invited to be part of the OA explorations task force. JAMIA=Journal of the American Medical Informatics Association.

All taskforce members agreed I could blog our process. In fact, they look forward to hearing suggestions from all of you! So here goes, first installment. Our report is due in September.

My main job on the task force is to outline the available alternatives. Below are my getting-started notes.

What options am I missing? Does anyone already have details for any of these options? Advice for JAMIA if you have been here, done this?

options

well-defined alternatives:

Three major options seem to be: publish JAMIA with an existing publisher of OA journals, run it independently through a self-hosted journal management system, or run it through a third-party hosted journal management system.

For reference, a SPARC review of scholarly OA journals in 2011 found that Springer published 9 society OA journals, Copernicus published 15, WASET published 21, BioMed Central published 33, and MedKnow published 64. I’m not sure what proportion ran on a self-hosted or externally hosted platform, but OJS lists many journal users.

Links are to the “contact us about your society journal” pages:

explorations:

We were told to think out of the box.  Excellent!  So, perhaps JAMIA could publish in an OA megajournal within a JAMIA Collection or tag, or ask for modification of terms of a well-defined option?

perhaps out of scope:

related issues

license

  • license that facilitates most reuse is CC-BY.
  • compromises would decrease use but potentially facilitate reprint revenue (ie BMJ)

embargo

  • immediate access facilitates most reuse.
  • compromises would decrease use but potentially facilitate subscription revenue (ie RUP)

editorial content

  • editorials available as OA facilitates most reuse
  • compromises would decrease visibility but potentially facilitate subscription revenue (ie BMJ)

advertisements

  • could host advertisements and get partial revenue (ie BMC)
  • could charge readers to view without advertisements

waivers

  • many OA journals have an automatic waiver or subsidy for authors from low-income countries (ie BMC, BMJ Open). Some also offer a subsidy or waiver upon request (ie BMC). At least one offers a guaranteed waiver for those who cannot afford to pay (PLoS).
  • the majority of society publishers do not charge any author-side fees

print

  • many OA journals are online-only.  JAMIA is currently available online and print.  Is print needed? Are there options available for print-on-demand?

publishing-charge subsidy

  • as an AMIA membership benefit, could offset article processing charges (ie BMC)

other related revenue possibilities

  • could release openly, have HTML available for free, but charge per-article or membership fee for PDF access (ie JMIR)
  • could charge for expedited peer-review (ie JMIR)
  • could charge submission fees in general
  • could charge for iphone apps, etc

info so far

costs

staffing

  • OJS, includes hours/week survey results

open questions

Many. A few:

  • How much of the back content could become OA? Is the copyright currently AMIA’s or BMJ’s? Answer: AMIA’s.

thoughts and observations

  • The OASPA resources section is a little light, and the blog was last updated in 2011. I’d say there have been a few OA events of note since then :) Upcoming conference in Hungary in September.
  • This is a less well trod path than I thought… I’ve made an initial contact to most of the organizations above, and none of them immediately zoomed me a how-to sales package (one or two were quick, but for most of the publishers I’ve contacted it has been 4 days and no response yet).
  • AMIA could join SPARC as an affiliate society. $5,710 annual contribution per calendar year. SPARC is active in advocating for funder mandates for OA, which would likely bring about  greater funder support for processing charges.
  • this is timely: two recent blog posts about OA and societies.  One by Mike Taylor, one by the Scholarly Kitchen.  There are other white papers etc also.  I’ll hopefully get a chance to recap them in a future post.

Edited July 16 to add a few things

July 9, 2012

makingdatacount: Outline #draftInProgress

Filed under: Uncategorized — Heather Piwowar @ 11:25 am

I’ve got a few manuscripts on the go this month.  One of them is on the state of Data Citation Tracking, making the same points as my IDCC talk last year and the recent DataCite presentation by Scott Edmunds of GigaScience (tracking stuff starts at slide 45).

Here’s the draft outline.  Obvious things missing?

Making data citation Count

  • 1. Why it matters
    • Encouraging more data archiving
    • Rewarding production and dissemination of useful data
    • Enabling fine-grained reward for all contributors
    • Discovering associated datasets and researcher communities
    • Filtering for frequently used — or neglected! — datasets
    • Correcting analyses based on erroneous data
    • Avoiding harmful shoehorning
    • Driving policy, funding, and tool requirements based on evidence
  • 2. Obstacles
    • Awareness
    • Encouragement and expectation
    • Agreement on best practices
    • Existing problematic policies
    • Tracking tools
    • Access to the literature to build tracking tools
  • 3. What we want to Count
    • Dataset-level metrics
    • Project-level metrics
    • Repository impact story rather than Repository impact factor
    • Reuses from outside the literature
    • Reuses from outside academia
    • Reuses of the reuses
    • Impact flavour
  • 4. Conclusion

July 3, 2012

Citation11k: Method section — access to citation data #draftInProgress

Filed under: Uncategorized — Heather Piwowar @ 8:27 am

The next installment in my #draftInProgress series on Open Data citation.

I’m not sure this section will make it into the paper in its entirety, though I do think it is important to highlight the serious hurdles in getting access to data for research on research.

This step of the methods was certainly the most time-consuming part of the study!

Methods: citation data

This study required citation counts for thousands of articles identified through PubMed IDs. At the time of data collection, neither Thomson Reuter’s Web of Science nor Google Scholar supported this type of query. It was (and is) supported by Elsevier’s Scopus citation database. Alas, none of our affiliated institutions subscribed to Scopus. Scopus does not offer individual subscriptions, and a personal email to a Scopus Product Manager went unanswered.

One author (HAP) attempted to use the British Library’s walk-in access of Scopus on its Reading Room computers during a trip overseas. Unfortunately, the British Library did not permit any method of electronic transfer of our PubMed identifier list onto the Reading Room computers, including internet document access, transferring a text file from a USB drive, or using the help desk as an intermediary (see related policies). The Library was not willing to permit an exception in this case, and we were unwilling to manually type ten thousand PubMed identifiers into the Scopus search box in the Reading Room.

HAP eventually obtained Scopus access through a Research Worker agreement with Canada’s National Science Library (NRC-CISTI), after being fingerprinted to obtain a police clearance certificate (required because she’d recently lived in the USA for more than six months).

At the time of data collection the authors were not aware of any way to retrieve Scopus data through researcher-developed computer programs, so we queried and exported Scopus citation data manually through interaction with the Scopus website. The Scopus website had a limit to the length of query and the number of citations that could be exported at once. To work within these restrictions we concatenated up to 500 PubMed IDs at a time into 22 queries, where each query took the form “PMID(1234) OR PMID(5678) OR …”

Citation counts for 10694 papers were gathered from Scopus in November 2011.

« Newer PostsOlder Posts »

Blog at WordPress.com.