We discuss many hurdles that investigators face when sharing their research datasets: cost, credit, and fear of misinterpretation and scooping, to name a few. I think there is a stealthy hurdle a bit further out. I’ve run smack-dab into it, and ouch it hurts my shins.
See, I have a bunch of projects that are almost done. Papers accepted, proofs being proofed, now the publishers want the final data archiving urls. Yay, right? Right! Except that now I actually have to archive the data. I’m totally on board with this, in theory. I think it is a good idea: in general, for me personally, for these projects specifically. I’m not worried about cost, credit, or fear of misinterpretation or scooping.
Then why am I hesitating, why have I put it off till now? Why, if I weren’t so committed to the cause, might I not do it at all?
My spreadsheets and my scripts, they just aren’t as elegant in real life as they are in my head. My scripts need more commenting, my README needs more detail, my column names need more consistency. I did try to follow best practices when I set them up, but that was many months ago and now I know better and I want to do better.
But after spending so much time getting the article text just right, caring about every silly detail in the bibliography, and doing somersaults to get the figures in the right format… the idea of upgrading one more set of research artifacts into a “published, ready to be archived forever, I’m proud of this” snapshot state feels daunting. I want more time, I want more inspiration, I’m not ready. Many researchers are perfectionists by nature, so I doubt I’m alone.
This issue is different than a lack of time or a lack of resources or fear of errors. It is difficult to quantify its impact on prevalence of data withholding. I have no doubt that it contributes to the relative willingness to share details with other investigators in a limited way, upon request, rather than in public for all to see. On request feels less final.
So how to lower the height of this hurdle? Examples and templates and guidelines and mentoring. Mandates and standards will help. Releasing widely early and often. Recognizing that creative output falls short of the aspired outputs all the time, and especially when people are new to something (check out the message in this video by Ira Glass). Repeating that the perfect is the enemy of the good. All of that. All of the ways academics learn to deal with harmful perfectionism in other aspects of what they consider part of their job.
Nonetheless, as people who think about the challenges to data archiving, we ought to remember that perfectionism is unlikely to be volunteered as a reason for data withholding and yet probably makes a substantial impact, particularly before data archiving becomes standard practice.
I’m off to submit my datasets now.