Research Remix

November 27, 2019

Video of Open Repositories Keynote, and Thankful.

Filed under: conferences, openaccess — Heather Piwowar @ 12:07 pm

I just posted on the Our Research blog (I’m not blogging much there either, but a bit more than here) and wanted to cross-post the content here.  Partly because it includes a link to the video of my Open Repositories keynote, which I’m proud of :), and partly because I really do feel thankful and want to share the thanks in all the venues I have.


It’s American Thanksgiving this week, and we sure are thankful. We’re thankful for so many people and what they do — those who fight for open data, those who release their software and photos openly, the folks who ask and answer Stack Overflow questions, the amazing people behind the Crossref API…. the list is long and rich.

But today I want to shout out a special big thank you to OA advocates and the people behind repositories. Without your early and continued work, it wouldn’t be true that half of all views to scholarly articles are to an article that has an OA copy somewhere, and even better this number is growing to 70% of articles five years from now. That changes the game. For researchers and the public who are looking for papers, and for the whole scholarly communication system in how we think about paying for publishing in the years ahead in ways that make it more efficient and equitable.

I gave the closing keynote at Open Repositories 2019 this year, and my talk highlighted how the success of Unpaywall is really the success of all of you — and how we are set for institutional repositories to be even more impactful in the years ahead. It’s online here if you want to see it. We mean it.

Thank you.

August 3, 2010

Why I want everything OA, right now

Filed under: openaccess, tools — Tags: , , — Heather Piwowar @ 2:29 pm

I’ve started using Mendeley. I like it a lot, so far. Papers, but with a networking aspect. CiteULike, but with a quick PDF full-text search aspect. Free. Cross platform. Good stuff.


The But isn’t Mendeley’s fault. It is a result of the evolution of our methods of scientific communication. I’m usually a fan of iteration, evolution. Not this time. I want instant, sweeping change.

I want to share all the PDFs in my Mendeley library with everyone. Right now.

I can’t, I know. Maybe some I can, some are OA articles or author preprints with redistribution rights. But most I can’t, because publisher licenses say I can’t. Because that is the way that is where we are right now. I get it, but it is SUCH A PAIN. Because my bibliography would be so much more useful to people if they could use it like I use it, by searching full text. And clustering. And browsing. And doing other things that you can only do with full text.

It isn’t well tagged, it isn’t well meta-dataed, because I don’t use it that way.

For others to go find PDFs for all the articles will be inconvenient, inefficient, and probably unsuccessful due to shrinking subscription budgets and uber-interdisciplinary nature of my field.

I normally like to talk about possible solutions when I talk about problems. In this case, though, I’m just taking a moment to imagine what the world would be like if we could just freely share research outputs with each other, completely, without a second thought.

Wow, eh?

Anyway, in this real world, if you are interested in data sharing, or just want to see what a Mendeley bibliography looks like, here’s my Data Sharing and Withholding Collection:

I’ll be creating other public collections too, for data reuse, data citations, etc. Here’s my Mendeley profile to watch if those topics interest you. Send your collections my way if you have related ones?

And if anyone really wants to see the PDFs, I can invite you into a Shared Collection. Better than in the old days of photocopying, right?

April 7, 2008

Non-OA Full-text for text mining

Filed under: openaccess — Tags: , — Heather Piwowar @ 9:28 am

Interesting discussion on Peter Murray-Rust’s blog about whether PubMed Central articles can be crawled and used for text mining. The answer is no, not now, not unless they are open access (as opposed to traditional closed access but deposited in PMC).  Really unfortunate.  Incremental progress, we’ll get there.

Anticipating my thesis work, I’ve been wondering about similar text mining questions. I think my needs are a bit different than those of PMR: I’m interested in papers that meet a targeted search, rather than all articles or all articles in relevant journal (what I gather he’s be interested in?). I’m willing to limit myself to the articles that I have access to through my University’s subscriptions. I don’t need figures. I think once I have the papers I’m allowed to text mine them as fair use, since I have them under permission. So the question is what can I automatically download?

I learned I can’t spider PMC, but what about normal PubMed? Try as I might, I couldn’t find verbage on the PubMed website allowing/disallowing spidering through to full-text links on publisher websites (the links that are populated and visible when I’m logged in through the University’s connection). Is this allowed? Still seems like it might not be. And then you end up at the publisher sites anyway, with all of their differing rules. Unfort, the publisher’s rules are often hard to find, confusing, and vague (as often noted by PMR and others). Aaaaah.

So last month I asked our librarians….

As you know, PMC has OA and non-OA full-text. They make their OA text available via FTP etc, and they stipulate that those mechanisms are the only way that people are allowed to access the full text “because of copyright restrictions” []. I’d also like to access non-OA text for which Pitt has subscriptions, but it sounds like I can’t do this by “crawling” PMC based on their rules [explicitly stated in the link above]. I guess I’m wondering if I can do it by “crawling” the normal, full PubMed. Basically write a script to find the “HSLS” links on the article citation pages, follow them (usually into the publisher’s websites), and automatically save the html or pdf articles that are returned from a PubMed query.

There is no difference in the end result from me manually clicking through and saving the papers… but there is sure a difference in the manual time requirement! I wouldn’t have thought this sort of automated downloading would be a problem… but the Restrictions on Systematic Downloading of articles in the PMC copyright notice referenced above makes me want to double-check. I can’t find any reference to “crawling” or “systematic downloads” for PubMed itself.

I do understand there are user requirements when using the Entrez programming utilities (run automated queries during off hours, 3 seconds between queries, etc) and I would be sure to honor those both with the elements of my scripts which use the E-tools and those which are crawling the web pages directly.

Does that make sense? Are you aware of any restrictions for crawling PubMed to automatically access and save content for which I do indeed have access through Pitt? I guess since I’m going into the publisher’s websites, they might also have restrictions? Is there another way to consolidate a large set electronic full-text articles (ideally a few thousand)?

Thanks very much for any pointers you may have.

The librarian responsed that automatically following PubMed links should be fine, and that there shouldn’t be problems from publisher sites because we have subscriptions and my text mining falls under fair use. I’ll add that I think it helps that I’m not aiming to download full editions, because I do know that some publisher websites disallow that.

Maybe I shouldn’t be bringing it up again here, since it feels like I’ve been given an institutional “All Clear.” But no sense burying my head in the sand in case there really are issues: I want to know. Web downloading policies and full-text reuse policies are so complicated. I’ve spent time looked into them, but it sure seems like unless it is your full-time job it is impossible to understand and keep on top of how it works. I don’t think our librarians deal with these issues every day. Who else would I go to for clarification?

Does anyone have differing interpretations, warnings, reassurances, alternatives, and general paths through this crazy mess? How do other people do this???

June 5, 2007

Sharing reviewer’s comments

Filed under: openaccess, openreviewing — Heather Piwowar @ 10:59 am

Nice post on Open Reading Frame, spurred by a thought-provoking letter in Nature about making reviewer’s comments publicly accessible after an embargo period. Excerpted:

Via Peer-to-Peer, Ariberto Fassati in this week’s Nature correspondence (sorry, toll access only):

Reviewers [of scientific publications] often make significant contributions in shaping discoveries. They suggest new experiments, propose novel interpretations and reject some papers outright. […] It is well worth keeping a record of such work, for no history of science will be complete and accurate without it.

I therefore propose that journals’ records should be made publicly available after an adequate lapse of time, including the names of reviewers and the confidential comments exchanged between editors and reviewers. The Nobel Foundation makes all its records available after 50 years, as do many governmental and other institutions. This delay may be reduced for scientific journals to, perhaps, 15 or 20 years.

Now that’s a damn good idea: it’s long past time that reviewing got its due as an essential part of a scientist’s job, and opening the records should help to generate such recognition (to say nothing of the invaluable contribution to historiography of science). My only quibble: why 15 years? If six months is long enough for an embargo on a closed-access paper, why is it not also long enough to keep the reviews secret? I presume the idea is to prevent retaliation for harsh reviews, but if all the information is public it would take a truly dedicated holder of a truly heinous grudge to follow up (in such a way as not to get caught doing it!) after six or twelve months. More to the point, we can dramatically reduce the risk of such retaliation by changing the community attitude towards reviewing. If peer review becomes a fully acknowledged part of the job, excellence in which is respected and rewarded — and if everyone knows their reviews will be made public! — then low quality (gratuitously mean, ill-informed, lazy, self-serving, etc) reviews should be a thing of the past.

Great idea.

Worth noting: PLoS ONE now offers reviewers the option of making their reviews public shortly after publication (anonymously or not, see email here). Naive, brave, or both… I agreed [posted as a discussion on this article]. If anyone has constructive comments for improving my reviewing, I’m all ears.

Be the change you want to see!

Edited to add:

PLoS ONE plug for all readers: if you have any thoughts about the above article and/or review, please take a minute to add to the discussion at the PLoS ONE page. It is there exactly for this purpose!

Blog at