Text-mining access sounds like a pretty niche need. It isn’t. One of the use cases is near and dear to readers of this blog.
You know all of our excitement about research data? Making it available, making it citable, rewarding the data-producing investigators? It only works if we can indeed identify the research that is built on a dataset and then include the reuse stats in CVs and reports and webpages. What tools can we use for this today?
Google Scholar: nope. They’ve said that if they support tracking datasets today it is by accident and they plan to remove such support in the future. Futhermore Google Scholar offers no API access to its data (and has said it will not offer such access for years, if ever), so any numbers you calculate with it today can only be viewed on the Google website or through manual copy-and-paste into reports.
total-impact and altmetric.com: a few altmetrics tools are actively building support for tracking dataset reuse.
You know what these altmetrics tools — or any new tool that we hope will solve this problem — needs to be successful? Programmatic access to the literature. Need to be able to search across all papers for dataset identifiers and find them in full text and reference lists. Furthermore, ideally we need to be able to do more advanced text analysis to distinguish between whether an ID is mentioned because the paper is talking about having *gotten* the data from somewhere as opposed to having *put* it somewhere.
Anyway. This? This is text mining. This is just one of the reasons we all need text-mining access, we need it to everything, and we need it now.
Get the ball rolling at your institution too.
This article is translated to Serbo-Croatian by Jovana Milutinovich from Webhostinggeeks.com