Research Remix

March 5, 2013

Why Google isn’t good enough for academic search

Filed under: Uncategorized — Heather Piwowar @ 6:49 am

People often ask: why all the fuss about Search for academic papers?  Google does a fine job, we can find everything we need, what’s the problem?

I gave an answer to this in a comment on Mike Taylor’s blog and it got a bit of twitter pickup, so reposting my comment here for this audience.  Summary:  no one can build on the results!  

Google isn’t an acceptable answer to Searching across academic papers (toll access, green OA, gold OA, whatever) because it doesn’t support a way for people to digest the search results, add value, and apply the results in new and innovative ways. Google search results can only be used on Google’s website manually, or embedded as-is in other websites.

Neither Google nor Google Scholar offer an API — for love nor money, as far as I can tell, point me to it if I am wrong — that would let us do a Google Search and then sort/filter/enhance the results to add value and use in research and in scholarly tools.

Totally unacceptable as a search solution for the scholarly literature.  Think of the opportunity cost to research and research tools, and all the things that better research tools facilitate.

It doesn’t have to be this way.  Search results can be openly available for reuse (see the search APIs and API terms of use for PLOS, PMC, etc).

15 Comments

  1. Does this mean ImpactStory’s next project will be an academic search engine? (crossing fingers for a yes) -Scott

    Comment by Scott ChamberlainScott Chamberlain — March 5, 2013 @ 7:05 am

  2. hahaha. NO. :)

    Comment by Heather Piwowar — March 5, 2013 @ 7:07 am

  3. Dead right!

    Also: Google (and other text-indexing search engines) are just not that good at non-trivial searches. If I already know what paper I want, Google is good and finding it by title. But if all I know is that I want late 19th-century monographs on Morrison-formation sauropods, it doesn’t know where to start. All the metadata needed to handle such searches is generally available for academic papers, but general-purpose search engines don’t know what to do with it.

    Comment by Mike Taylor — March 5, 2013 @ 7:29 am

  4. (Ignore this. As usual, I forgot the check the “Notify me of follow-up comments via email” box, and for some reason the only way to do that is by leaving a comment.)

    Comment by Mike Taylor — March 5, 2013 @ 7:29 am

  5. And another reason, via Tim McCormick on twitter: Google’s coverage of academic papers is incomplete & opaque.

    Comment by Heather Piwowar — March 5, 2013 @ 7:31 am

    • Right. “Opaque” is a much worse problem than “incomplete”. It isn’t just that not everything is there, it’s that you can’t tell what’s there and what isn’t in any more systematic way than manually probing with searches.

      None of this is to criticise Google, of course: it does an amazing job, and it’s even more amazing that it’s general-purpose approach works as well as it does on academic papers. But it’s not nearly enough, and it would be awful if people’s Google-acclimatisation led them to accept the level of its functionality as defining what’s possible.

      Comment by Mike Taylor — March 5, 2013 @ 7:37 am

  6. It may not be sufficient, but it is the only place where one can search for a paper and come up with a small, independent, low-visibility journal next to the publishing giants—with no differentiation between the two. It has levelled the playing field for small journals, especially those from developing countries. I am so grateful for the service and what it has done for the democratization of knowledge, that I am inclined to forgive the transgression. That said, I do agree with your overall sentiment and, in a perfect world, we the inclusiveness and the openness would coexist.

    Comment by Juan Pablo Alperin — March 5, 2013 @ 9:12 am

  7. Could something be built on CommonCrawl?

    Comment by Scott Chamberlain — March 5, 2013 @ 9:23 am

    • Yup, in theory! In practice, right now the Common Crawl learning curve is still very steep. Also, I’m guessing few publishers permit their websites to be indexed by Common Crawl. See forthcoming post…

      Comment by Heather Piwowar — March 5, 2013 @ 7:20 pm

  8. I asked googlescholar again recently if they’d changed their policies regarding the of tracking of datasets/data DOIs in light of Thomson-Reuters now doing it and the recent letter in Nature and they replied: “There has been no change at our end regarding indexing datasets”. Another wasted opportunity from them really, and whilst Thomson-Reuters get a lot of flak, they at least have put their money where their mouth is and bothered to make a data citation index.

    Comment by Scott Edmunds — March 6, 2013 @ 12:26 am

  9. Hello everyone – This is a very relevant and well stated position. I wanted to know if anyone was planning on attending #btPDF2 (http://www.force11.org/beyondthepdf2) as these issues will be center front. Also, if your unable to make it the event will be live streamed as well.

    Comment by Jonathan Cachat — March 6, 2013 @ 7:04 am

  10. […] interesting little comment on “Why Google isn’t good enough for academic search”. Google scholar tends to be my first port of call these days, but the points made in this […]

    Pingback by Another miscellaneous grab-bag of goodies, links ‘n’ stuff | Computing for Psychologists — March 12, 2013 @ 4:15 am

  11. […] recently talked about why Google is not a good enough solution for searching the academic literature (read the comments on that post for […]

    Pingback by Why may Google textmine but Scientists may not? | Research Remix — March 13, 2013 @ 1:50 pm

  12. Dear Heather, this was exactly the point I have been trying to make for the last 2 years. A recent paper, where I discuss the issues of Google Scholar and MS Academic search has been published in D-ib “CORE: Three Access Levels to Underpin Open Access.” http://www.dlib.org/dlib/november12/knoth/11knoth.html .

    Citing a section from the paper: “So, what is it that Google Scholar, Microsoft Academic Search and the mentioned cross-repository search systems are missing? What makes them insufficient for becoming the backbone of OA technical infrastructure? To answer this question, one should consider the services they provide on top of the aggregated content at the three access levels, identified in the Introduction, and think about how these services can contribute to the implementation of the infrastructure for connected repositories. Table 1 below shows the support provided by academic search engines at these access levels. As we can see, these systems provide only very limited support for those wanting to build new tools on top of them, for those who need flexible access to the indexed content and consequently also for those who need to use the content for analytical purposes. In addition, they do not distinguish between Open Access and subscription based content, which makes them unsuitable for realising the above mentioned vision of connected OARs.”

    Comment by Petr Knoth — March 28, 2013 @ 8:24 am

    • Thanks for the reference, Petr! I’ve been surprised there have been so few other people talking about this…. I guess we are all too dispersed to know of each other? Anyway, very glad to hear it and make the connections.

      Comment by Heather Piwowar — March 28, 2013 @ 8:30 am


RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Blog at WordPress.com.