July 5, 2010

Evolution 2010 and iEvoBio recap

As my first exposure to the field of evolution, attending Evolution 2010 and iEvoBio was drinking from a firehose. That said, it was a productive and enjoyable dousing. Highlights:

Slides from presentations that highlighted open science, data sharing and archiving, and reward structures:

  • Mike Whitlock: Data Archiving in Evolution info session, discussed motivation and details on the joint data archiving policy that will require data archiving across six journals starting next year (policy, slides)
  • Carl Boettiger: My experiment with open science: Why the benefits of sharing go beyond source code, a dynamic practical case study of how and why to do open science (slides)
  • Todd Vision: The Dryad Digital Repository: Published evolutionary data as part of the greater data ecosystem, motivation and overview for the new data repository for post-publication datasets in evolution and ecology (abstract and demo)
  • Rob Guralnick: Biodiversity Discovery and Documentation in the Information and Attention Age, keynote talk highlighting, in part, the value in sharing pre-publication data and the need to change our reward structures to value that contribution (slides)
  • Anne Thessen: New Biology: The Data Conservancy and Data Driven Discovery, an overview of the ambitious data conservancy project(website)
  • Jonathan Eisen: Phylogenomics of microbes: the dark matter of biology, keynote with some plugs for PLoS and project openness (slides)
  • Rutger Vos: TreeBASE2: Rise of the Machines, background on new machine-friendly interfaces to TreeBASE (slides) and demo

A few other presentations related to how we develop or communicate science:

  • Rod Page: Phyloinformatics in the age of Wikipedia, a talk on the value of realizing how people find science (slides)
  • Vincent Smith: Top-down and bottom-up informatics: who has the high ground?, powerful case studies of successful and unsuccessful projects (abstract)
  • Cynthia Parr: Community content building for evolutionary biology: Lessons learned from LepTree and Encyclopedia of Life, case studies on the relative strengths of different design approaches (abstract)
  • N. Dean Pentcheff: Copyrights and digitizing the systematic literature: the horror… the horror…, about why it is important and completely legal to assemble an open digital archive of phylogeny papers under fair use

We also had a very interesting iEvoBio Birds-of-a-Feather session on open science, data sharing and reuse, and data citations. There were about 10 of us in a wide-ranging and interesting discussion with diverse perspectives.

Overall the iEvoBio meeting was fun and useful: a very successful first year kickoff bringing together people with similar interests. Thanks to Hilmar Lapp and the other organizers for all of their work. Can’t wait to go next year and contribute to the theme of research openness.

Of course the meetings were also a very useful intro to the field of evolution itself. Sean Carron’s Gould prize lecture told the historical story of evolutionary theory: entertaining and informative, it was a fantastic start.  I also enjoyed the two award research lectures (though I wish they hadn’t overlapped with 5pm info sessions on Data Archiving and NESCent). The presentations and posters gave me a good high-level overview of what questions people are looking at, what kinds of data are produced and reused, what tools are developed, and what kinds of creativity and hard work required in designing effective experiments.

Finally, I made a number of contacts and spent some time with my NESCent and Dryad community, my local UBC community, and others interested in open data and open science within this domain… crucial given the remote nature of my postdoc.

Left now with oodles of disjoint notes, ideas, and enthusiasm for my next steps. Here we go!

ETA:  Want more info on iEvoBio?  Summary of online artifacts.

September 15, 2008

Pedersen: software results shouldn’t be a matter of faith

Great article by Ted Pedersen in the Sept 2008 issue of Computational Linguistics (“Empiricism is not a matter of faith”) about the importance of sharing research software, not just the results of running the software.

Sadly, the article isn’t freely available online(see below).  Here’s a link to the first page, a few quotes, and the article outline.

  • “While his work achieved publication, it must gnaw at his scientific conscience that he can’t reproduce his own results.”
  • “We publish page after page of experimental results where apparently small differences determine the perceived value of the work.  In this climate, convenient reproduction of the results establishes a vital connection between authors and readers.”
  • “We do this routinely, to the point where we seem to have given up on the idea of being able to reproduce results.”
  • “often unintentional fallout from how we manage projects and set priorities”
  • “Imagine meeting with a new project member and being able to say: ‘Go download this software, read the documentation, install it, run the script that reproduces our ACL experiments, and then we can start talking tomorrow about how you are going to extend that work…'”
  • “Finally, although this viewpoint may seem quaint or naive, a great deal of our research is funded by public tax dollars, by people who make ten dollars an hour waiting tables […] Although most taxpayers won’t have much interest in reading our papers and running our code, they ought to have the opportunity.  And who knows […]”
  • Concludes by suggesting either we decide approach things with a focus on bigger ideas, or instead insist that “highly detailed empirical studies must be reproducible to be credible, and that it is unreasonable to expect that reproducibility be possible based on the description provided in a publication.”

Article overview:

1.  The Sad Tale of the Zigglebottom Tagger

2.  The paradox of Faith-Based Empiricism

3.  A Heretic’s Guide to Reproducibility

3.1  Release Early, Release Often

3.2  Measure your Career in Downloads and Users

3.3  Ensure Project Survivability by Releasing Software

3.4  Make the World A Better Place

4.  What should Computational Linguistics Do?

[Make the article freely available online ASAP :)]

[[ETA:  An author-archive of the full-text is available here: ]]

September 11, 2008

PSB Open Science workshop talk abstract

The program for the Open Science workshop at PSB 2009 has been posted.  Great diversity of topics… I’m really looking forward to it.

My talk abstract is below… comments and suggestions are welcome!

Measuring the adoption of Open Science

Why measure the adoption of Open Science?

As we seek to embrace and encourage participation in open science, understanding patterns of adoption will allow us to make informed decisions about tools, policies, and best practices. Measuring adoption over time will allow us to note progress and identify opportunities to learn and improve. It is also just plain interesting to see where we are, where we aren’t, and where we might go!

What can we measure?

Many attributes of open science can be studied, including open access publications, open source code, open protocols, open proposals, open peer-review, open notebook science, open preprints, open licenses, open data, and the publishing of negative results. This presentation will focus on measuring the prevalence with which investigators share their research datasets.

What measurements have been done? How? What have we learned?

Various methods have been used to assess adoption of open science: reviews of policies and mandates, case studies of experiences, surveys of investigators, and analyses of demonstrated data sharing behavior. We’ll briefly summarize key results.

Future research?

The presentation will conclude by highlighting future research areas for enhancing and applying our understanding of open data adoption.

April 2, 2008

A Centralized Proposal Repository

I actively support Nature Precedings as a place to archive my early research findings for visibility, feedback, and attribution. I recently submitted a research proposal. Precedings does a spot-check of all submissions to verify appropriateness. It usually takes a day or two, and results in an automated response stating that your submission has been posted. In this case, I received a personal, thoughtful email explaining that although Nature Precedings had published proposals in the past, they are moving away from this practice to concentrate on their core goal of “a repository for manuscripts, posters, and presentations describing completed research.”

I see the issue. On one hand, if the goal of a preprint is to get feedback and attribution for research ideas, what better time to do it than at the proposal stage. On the other hand, ideas are a dime a dozen and so it might not be scalable for Nature Precedings.

Sounds like we need another solution. There was a letter to Nature just recently (highlighted by Maxine Clarke in Nautilus), calling for a Centralized Proposal Repository. This idea is grander than simply a wiki for feedback and attribution. Dr Harel is suggesting it could also be searched by funders, to identify projects which match their interests.

Dr Harel expands on this idea on his website and links to the beginning of a Centralized Proposal Repository wiki.

Great idea, and I think right up the alley of open science. The wiki seems to be Protected (to limit spammers?) I’ll go ask to join, and keep you updated on what I learn.

ETA: Looks like Jean-Claude Bradley was involved in the set-up of the wiki. Great stuff!

March 25, 2008

PSB Open Science workshop: call for participation

As reported by the organizers at One Big Lab and Science in the Open, PSB 2009 is going to have a 3 hour workshop devoted to Open Science. Neat, eh? What a great chance to meet and learn from others who are working this way and/or thinking about this topic! And discuss it with others who just happen to be in the PSB neighbourhood! And go to Hawaii in January! :)

I’ll definitely be submitting a talk proposal. Still brainstorming the topic. The winner in my head over the last 24 hours is “Open Science: Measuring the Costs and Benefits.” What do you think? Other ideas? Off to email Shirley and Cameron…..

February 8, 2008

Letter of support for PSB session on Open Science

Posted here under under Creative Commons license, so please reuse, remix, re-send!

Dear PSB organizers,

I fully support the proposal for a session on Open Science at PSB 2009, and commit to submitting a research paper on data sharing and reuse.

The specific research topic will be derived from my doctoral dissertation, related to measuring the prevalence, patterns, causes, benefits, and motivations for biomedical data sharing and reuse.  I have a previous publication in this area (“Sharing Detailed Research Data Is Associated with Increased Citation Rate” at PLoS ONE), a few posters (including one at PSB 2008), and several papers in draft.  The paper will be co-authored with Dr Wendy Chapman.

I believe that Open Science definitely constitutes a “hot topic” within biocomputing, and has the potential to fundamentally change the way we think about our work.  The topic is relevant to data producers and data consumers, biologists and computer scientists, all with varied perspectives.

Discussion and measurement of benefits, hurdles, progress, and best practices could (and is) taking place in blogs, the popular press, Birds-of-a-Feather sessions, and scattered research papers.  A session at PSB would be a unique opportunity to give this emerging meta-approach the serious examination it deserves.

Thank you for considering this proposal.


Heather Piwowar
Doctoral Student
Department of Biomedical Informatics
University of Pittsburgh

February 7, 2008

Plug for PSB Session on Open Science

Hi Blogosphere!  I’ve missed you.  Hope to re-engage in the next month or two.   Briefly in the mean time….

I want to offer public support and a last-minute plug: Shirley Wu and Cameron Neylon are proposing a session on Open Science at next year’s PSB meeting.  Fantastic idea.

You can help… if you’d like to attend and/or submit a presentation, please send them a letter stating such by tomorrow noon PST.  Details here.

I’m sold.  PSB is a first-rate conference, it would be a unique opportunity to bring together perspectives on Open Science developments, I’d love to meet others in the community, and yup… it is in Hawaii in January.

Off to write my “I’ll definitely submit a research paper on data sharing and reuse” email,


