Research Remix

April 6, 2011

Interested in data sharing? No summer plans yet?

Filed under: Uncategorized — Heather Piwowar @ 6:25 am

DataONE has an awesome internship program…. and you only have until April 8th to apply for Summer 2011!  The internships are open to undergrads, grad students, and those who have received a masters or phd in the last five years and live and can work in the USA (see here for all eligibility details).

Modelled after the Google Summer of Code, these full-time internships are done through remote collaborations and focus on concrete deliverables.

I was lucky enough to be the primary mentor for three DataONE interns last summer.  It was fun experience for all of us, and each of the interns did truly useful and impressive research into policies and practices of data citation, data sharing, and data reuse.  See their blog, open notebooks, IDCC poster, and keep your eyes open for the soon-to-be-published paper.

The array of DataONE projects seeking interns in 2011 has something for everybody:

  1. DATA MANAGEMENT: Best practices of data management for public participation in science and research
  2. DATA MANAGEMENT: Online learning modules related to best practices throughout the data lifecycle
  3. EDUCATION: Accessing and analyzing environmental data in the classroom
  4. SOCIOLOGY OF SCIENCE: Understanding how scientists analyze data
  5. DATA SCIENCE: How much ecological data is out there?
  6. DATA SCIENCE: Tracking the reuse of 1000 datasets
  7. PROGRAMMING: Subsetting and publishing “dynamic” scientific datasets
  8. PROGRAMMING: Scientific workflow provenance repository and publishing toolkit
  9. PROGRAMMING: Integrating loosely structured data into the Linked Open Data cloud
  10. SCIENCE COMMUNICATION: Developing video animations for DataONE community engagement

In particular, I’ll call out the #6 Tracking the reuse of 1000 datasets project, since I have a vested interest in recruiting a particularly stellar intern to that one :)  Here are the details:

Tracking the reuse of 1000 datasets
Description: We believe that openly archiving raw data facilitates valuable reuse. Can we measure this? What contribution does data reuse make to the published literature? Who reanalyzes data? For what? Does this vary across disciplines and repositories? These questions are the focus of an exploratory study, “Tracking data reuse: Following one thousand datasets from public repositories into the published literature.” In this internship you’ll work directly with Heather to collect, extract, annotate, and analyze data to explore these important questions. See for more info on the project.
Qualifications needed: Self-starter, determined, enthusiastic, willing to keep a research notebook up-to-date openly online. Experience with statistics, the academic literature, PubMed, ISI Web of Science, Python, R, and blogging would be helpful.
Skills to be learned:Research methods, research data collection, text extraction from the scientific literature, keeping an open science research notebook, communicating research results
Primary mentor: Heather Piwowar (National Evolutionary Synthesis Center)
Secondary mentor: Todd Vision (University of North Carolina Chapel Hill/National Evolutionary Synthesis Center)

Sound interesting?  Apply!  Have questions?  Shoot me an email (hpiwowar gmail).  But do it fast, you’ve got to get your application in by April 8th.  Already have plans?  Get in touch, and mark your calendar for next summer….


Blog at

%d bloggers like this: