Research Remix

November 18, 2011

Doing data archiving well

Filed under: Uncategorized — Heather Piwowar @ 6:30 am

It is easy to think that archiving data is easy: just put the data files up on a website.  To do it well, though, isn’t that easy.  The Dryad digital repository has been thinking hard about these issues for years, working toward a practical, simple, and rewarding solution.  For Dryad’s website and promotional material we’ve articulated some of the issues we feel are important; see Why Should I Choose Dryad for the up-to-date version.

I copy the current text here to inspire a conversation about “selling points” for a data archive, and even more importantly illustrate how involved it is to make a data archive great.


Dryad aims to make data archiving as simple and as rewarding as possible:


  • Dryad welcomes data files associated with any published article in the biosciences, as well as software scripts and other files important to the article.
  • There is no restriction regarding data formats.
  • Dryad works with journals to integrate article and data submission, streamlining the deposit process. Once the files are prepared, deposition typically takes less than 15 minutes (2-minute video here).
  • Data destined for more specialized repositories can, in some cases, be submitted through Dryad, reducing the time and complexity of data submission yet further.
  • Dryad provides a single clear and best-practice option for terms of reuse.
  • curator will check your files for technical problems before they are released.
  • By default, data are embargoed until journal article publication. Dryad makes sure this happens so you do not need to.
  • If it is supported by the policy of the journal, you may, during the submission process, select a‘no-questions-asked’ embargo on data downloads for one year post-publication. Dryad will support a longer embargo if directed by a journal editor.
  • You are free to provide additional keywords that make the data easier to discover and additional documentation (in the form of ReadMe files) to help ensure proper data reuse.
  • You have the ability to add new versions of data files in order to make updates or corrections.
  • Dryad can make data securely available for peer review at the request of the journal.


  • Dryad works to ensure that you get credit for reuse of your data by promoting adoption of best-practice data citation policy and the trackability of data citations.
  • Data files receive persistent, resolvable Digital Object Identifiers (DOIs) that can be used in a citation as well as listed on your CV.
  • Dryad’s terms of reuse for data facilitate the maximum impact for your work.
  • Data in Dryad are independently discoverable, providing a new route by which others may learn about your work.
  • Discovery is supported through the indexing of Dryad’s contents by services such as Google Scholar, Web of Science, and others.
  • Usage statistics are available for you to highlight when your datasets are frequently downloaded.
  • Since Dryad does not reject data for being of the wrong type or in the wrong format, all the data files associated with an article can be archived together.
  • Dryad can host files that are larger than those accepted by most journal websites (up to 1 GB per file and 10 GB per package).
  • Your data are preserved and made available for the long-term, even beyond the lifespan of Dryad, through continuous backup and replication services.
  • Dryad is community-led, with priorities and policies shaped by the members of the Dryad Consortium, including scientific societies, publishers, and other stakeholder organizations.
  • Dryad is a nonprofit, but takes sustainability seriously, ensuring that funds are available for long-term preservation.
  • Dryad is an active participant in organizations developing best-practices for data management such as BiosharingDataCite and DataONE. You as a researcher benefit from, and contribute to, the work of these organizations by depositing and using Dryad.

Have we left out any characteristics that matter to you?  Or do you have a wishlist of things you’d like to see in a data archive like Dryad? Let us know in the comments, or send us an email.  Thanks!


  1. I’m a big fan of Dryad and it is clear that a lot of thought has gone into the system.

    One characteristic that I think got missed a little in efforts to make things simple for contributors is an effort to make the data in the repository simple to use. The problem is that the repository has gone just a little too far with respect to the goal of “no restriction regarding data formats.” Having data in numerous formats makes it more difficult to remix the data, especially when it allows data to include serious structural issues like having special character footnotes inside data fields (see for example, I have repeatedly tried to utilize Dryad data for assignments in my programming and database management courses, but inevitably give up because far too much of the data requires advanced data manipulation before it can actually be used. This is certainly going to reduce the use of the data more broadly. Allowing data in proprietary formats is also of concern since accessing data in an Excel format will gradually get more difficult over time.

    It seems to me that for most standard tabular data, which is most of what is housed in Dryad, requiring that the data be provided as csv files would be a relatively minor inconvenience for the authors and result in a major improvement in the usability and long-term value of the data being deposited.

    Comment by Ethan — November 27, 2011 @ 1:00 pm

  2. […] Data repositories are gaining traction as a best practice solution. […]

    Pingback by thoughts on where journals are now, what to do next « Research Remix — December 3, 2011 @ 2:59 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Blog at

%d bloggers like this: