Do you want to know the impact of journal policies that mandate data sharing? Me too. I’ve been working on a study design to correlate policy adoption by journals with changes in scientists’ attitudes, experiences, and practices around publicly archiving research datasets.
I’d love some community input because writing a good survey and design is hard. Summary and a link to the draft questionnaire are below. I’m still finishing up the document to be submitted to my research ethics board ASAP. Thoughts or suggestions? I want it to be a useful study, one that focuses on just a few questions and studies them well.
Survey preview link (note: this link allows perusal only… survey responses are not being collected at this time)
- How do authors’ attitudes, experiences, and practices around public data archiving change when the journals they publish in adopt mandatory data archiving policies?
- Are changes specific to the journals that implement the policies, or do they extend to other journals in the same subfield?
- Do authors believe that adoption of a mandatory data archiving policy would dissuade them from submitting to a journal? Does that opinion change after they publish in a journal with such a policy?
Participants: Participants will be recruited from corresponding authors who publish in specific journals between November 2010 and November 2013.
Method: An invitation to the survey will be issued once a month to all corresponding authors who published in specific biology journals the prior month. Participants will be asked to answer a brief online web questionnaire designed to elicit their practices, experiences, and opinions around public archiving of research datasets. The answers will be compared over time, to understand if answers of authors in journals that adopt the “Joint Data Archiving Policy” (JDAP) begin to differ from those of authors in comparable within-community and outside-community journals.
Update Oct 15 2010: I’ve updated the proposal details with several changes, including a longer data collection timeline and an estimate of study size.
Background and related work
Note: Relevant references are collected in a public Mendeley group.
Summary: there have been previous studies of data sharing patterns, but none that focused on attitudes before and after policy implementation. Estimates of these impacts would help inform future policy decisions. Research into the impact of journal policies often lack rigor. Upcoming adoption of JDAP by several journals in evolution is a useful opportunity.
Participants will be recruited from corresponding authors who publish in specific journals between November 2010 and November 2013. Specifically, email addresses will be collected for papers published the previous month (or 3 previous months, for the initial month of recruitment) in each of the target journals. No other identifying information (author names, article title, etc) will be retained.
Recruitment email lists will be compared across months, and emails only sent to those who have not previously received a recruitment email to reduce survey fatigue and ill-will.
The email invitation will include a URL where subjects can learn more about the study and then read a fact sheet, consent, and participate if they choose. This email will also contain an “opt out” option, which will remove their address from the reminder list. Those who have not “opted out” will be contacted by email again, a week later, politely asking them to take the survey if they haven’t already.
Individual participants can expect to spend 7-10 minutes maximum answering the online questionnaire.
Before the initial round of invitations is sent, I will inform journal editors of this ongoing survey and provide contact information in case they have questions, suggestions, or concerns.
The survey will be sent to authors in JDAP-adopting journals and control journals. There are 7 JDAP-adopting journals, and I will select about twice as many journals for each of the in-community control and outside-community control arms. The average number of papers published per month in the JDAP journals is 168. If I assume a 20% response rate, I estimate that we will receive about 33 treatment responses, 66 in-community control responses, and 66 outside-community responses per month, for a total of 165 responses per month. Over the three years of data collection, this will lead to almost 6000 responses. The actual number of responses will be less than this, because will only invite each corresponding author once. On the other hand, the number of articles published by journals may increase over time and thus lead to a larger-than-anticipated sample.
Finally, I note that to the best of my knowledge, a similar study has not been administered and/or published. If this sample size proves to be inadequate, the results can be viewed as pilot information to help inform future studies.
Data confidentiality and release
The questionnaire was designed to avoid collection of unique identifiers and reduce the likelihood of reidentification. Neither recruitment nor prize-submission email addresses will be associated with the survey responses at any stage.
I plan on increasing the usefulness of this research by making the anonymous questionnaire responses openly available in a data repository for reuse, after aggregating or obfuscating any responses that could otherwise facilitate identification.
The responses of authors who published in JDAP-adopting journals will be compared, over time, to a) the responses of authors who publish in comparable evolution and ecology journals, and b) the responses of authors who publish in comparable biology journals outside evolution and ecology.
The primary endpoint is the mean of a 7-point Likert-scale response, asking authors how strongly they agree with this statement:
It is the community norm in my field to publicly share datasets upon study publication by archiving all datasets online, for use by anyone for any purpose.
About twelve other survey questions will allow secondary and exploratory analyses. A preview of the survey can be seen here.
The responses will be analyzed using a difference-in-differences approach wherein the differences survey responses before and after policy implementation for authors in JDAP-adopting journals are compared to those of authors who publish in control journals. If no journals are found to be appropriate controls in isolation, I will form a synthetic control, designed to match the response of the intervention group BEFORE policy implementation. A seminal example of this approach is Adabie et al‘s use of a pseudo-California, formed from a combination of other states, to study the impact of California’s cigarette tax relative to the pattern California would likely have exhibited in the absence of the policy:
Figure from Abadie A, Diamond A, Hainmueller J. Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program. Journal of the American Statistical Association. June 1, 2010, 105(490): 493-505. doi:10.1198/jasa.2009.ap08746. full text.
The Likert-scale responses in this study will reveal a distribution of responses. I plan to look at the trends in these distributions in exploratory analysis since policies may well induce interesting divergences of opinion. Exploratory analysis will also be done per-journal, since the magnitude of policy change, and probably the degree of policy enforcement, will vary.
TBD: List of journals to use as controls.
The limitations of this study design include:
- There is limited time before JDAP is implemented (~January 2011), so I will only have two pre-JDAP-implementation control time points. This limitation will be mitigated by the fact that there will be a lag between policy adoption is applicable to submitting authors and when their papers published.
- There are many variables that are also important to views of community norms and decisions about sharing that I am not collecting to keep the survey short and maintain anonymity. These include university characteristics, industry affiliations, subfield, publication rate, and age.
- This study makes no attempt to understand the rigor with which data sharing is encouraged in journals beyond their written policies: many journals without policies nonetheless expect data sharing and many with written policies fail to enforce them.
- The opinions of corresponding authors with respect to data sharing may not be typical of all authors: a future study could be done to quantify this.
- By only inviting authors to participate the first time they are listed as corresponding authors within our study scope, the population we are polling will shift gradually over time away from highly-prolific authors.
- Respondents may be reluctant to self-report when they do not share their datasets, particularly as their sense of sharing as a community norm increases, since survey respondents are often reluctant to report engaging in socially undesirable behavior.
- It will be difficult or impossible to derive causality direction for an association between reported data sharing behaviour and reported perception of data sharing as a community norm.
- Finally, this study only facilitates a relatively short-term look at policy effect: it will likely take many more years for the effect of these policies to be fully felt.