I’ve been having an email conversation with someone who is starting up a small new discipline-specific data repository. They hadn’t considered data licenses. I gave them an overview and the CC0 spiel (why CCZero? see here and here).
A few days later they — quite reasonably! — followed up with me, essentially saying “that all sounds complicated. Can’t I just say ‘this data is available for academic, research, and non-profit use’? I am not sure how the commercial access would fly with a lot of the SUBJECTAREA folks.”
Here’s my response. It isn’t the most carefully crafted and researched response in the world, but it is more useful to the world in my blog than just one Sent box and one Inbox. Add comments if there are ways you’d improve it? (I had a paragraph talking about how CC-BY (NC) might not even be appropriate for data, because data isn’t usually copyrightable… but that muddied the water more than helped. Better clear intent through standardized terms than free text sentences!)
Explicit is better: it means people don’t have to guess. Clear is also better. To be as clear as possible, it helps to use language that someone else has already figured out (with lawyers etc) rather than what appears to be a simple sentence but may actually contain a lot of ambiguity. (what do you mean by educational? nonprofit? what about if it is a commercial educational use? etc.)
If you want to prevent commercial use (or rather, require separate conversations for each commercial use), you could use CC-BY-NC. More and more people are becoming familiar with Creative Commons licenses (from open access publications, flickr photos, wikipedia, etc). Creative commons has figured out the legal language and has a nice description page that makes it really clear and explicit that you can link to. I’d strongly recommend this rather than crafting your own sentence.
Re commercial restrictions in general: Many academics are hesitant about allowing unrestricted commercial use for the data they collect. I think it is a discussion worth having, however, and a key value that your data repository will bring. It isn’t truly “Open” data if it can’t be used commercially (as per all Open Access consensus statements). See this discussion (about the literature, but substitute “data mining” for “text mining” and it is the same case): http://lists.okfn.org/pipermail/open-science/2012-March/001466.html
Especially when the data was collected with taxpayer money, a strong case can be made that the data should support economic growth… commercial use is a key part of that.
The link points to a great recent post on the importance of commercial textmining permissions, by John Wilbanks on the OKFN open science mailing list.