Research Remix

July 9, 2012

makingdatacount: Outline #draftInProgress

Filed under: Uncategorized — Heather Piwowar @ 11:25 am

I’ve got a few manuscripts on the go this month.  One of them is on the state of Data Citation Tracking, making the same points as my IDCC talk last year and the recent DataCite presentation by Scott Edmunds of GigaScience (tracking stuff starts at slide 45).

Here’s the draft outline.  Obvious things missing?

Making data citation Count

  • 1. Why it matters
    • Encouraging more data archiving
    • Rewarding production and dissemination of useful data
    • Enabling fine-grained reward for all contributors
    • Discovering associated datasets and researcher communities
    • Filtering for frequently used — or neglected! — datasets
    • Correcting analyses based on erroneous data
    • Avoiding harmful shoehorning
    • Driving policy, funding, and tool requirements based on evidence
  • 2. Obstacles
    • Awareness
    • Encouragement and expectation
    • Agreement on best practices
    • Existing problematic policies
    • Tracking tools
    • Access to the literature to build tracking tools
  • 3. What we want to Count
    • Dataset-level metrics
    • Project-level metrics
    • Repository impact story rather than Repository impact factor
    • Reuses from outside the literature
    • Reuses from outside academia
    • Reuses of the reuses
    • Impact flavour
  • 4. Conclusion


  1. Hi Heather,

    I reckon this will be a potentially very useful paper, as you probably could tell from my slides (and the video that DataCite posted if you’ve seen it) that really pushed some of your comments and work at the end.

    If you are after feedback, for part 1 you could expand a little on the rewarding dissemination point. Its not the most altruistic example, but when BGI was racing with other groups to get the E. coli data out – sticking a DOI had the handy side-effect of acting as a date-stamp to prove that they released it first. As everything was happening so quickly it allowed the work to potentially start picking up citations before the main genome paper came out as well, so it had a double benefit for the authors. Even though it was an example of self interest, it was very useful in helping persuade the co-authors to release the data early, especially when rival groups were happy to try to take credit but not so happy or fast in releasing the data.

    Good luck, and look forward to seeing this in print!


    Comment by Scott Edmunds — July 9, 2012 @ 7:23 pm

    • Scott, I like this point that it not only rewards dissemination, it rewards *early* dissemination. Nice, thanks.

      Was great to see you making the tracking point in your slides…. too few of us are talking about that side of the equation. Hopefully a paper to point to will help :)

      Comment by Heather Piwowar — July 10, 2012 @ 3:27 am

  2. Heather, how about adding grant-level metrics under ” what we want to count”. Thinking this would be a way to demonstrate reuse of data funded by specific grants.

    Comment by Donna Kafel — July 10, 2012 @ 8:39 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Blog at

%d bloggers like this: