Research Remix

January 31, 2012

31 Flavors of Research Impact through #altmetrics

Filed under: Uncategorized — Tags: , , , — Heather Piwowar @ 8:54 am

The impact of a research paper has a flavour.  It might be champagne: a titillating discussion piece of the week.  Or maybe it is a dark chocolate mainstay of the field.  Strawberry: a great methods contribution.  Licorice: controversial.  Bubblegum: a hit in the classrooms.  Low-fat vanilla: not very creamy, but it fills a need.

CC-BY-NC by maniacyak on flickr

There probably aren’t 31 clear flavours of research impact.  How many are there?  Maybe 5 or 7 or 12?  We don’t know.  But it would be a safe bet that, just like ice cream, our society needs them all.  It depends whether we have a cone or a piece of apple pie.  The goal isn’t to compare flavours: one flavour isn’t objectively better than another.  They each have to be appreciated on their own merits for the needs they meet.

To do this we have to be able to tell the flavours apart.  Imagine that for ice cream all you had to go by was a sweetness metric.  Not happening, right?  So too, citations alone can’t fully inform what kind of difference a research paper has made on the world.  Important, but not enough.

We need more dimensions to distinguish the flavour clusters from each other.  This is where #altmetrics comes in.  By analyzing patterns in what people are reading, bookmarking, sharing, discussing, AND citing online we can figure out what kind — what flavour — of impact a research output is making.

Unfortunately we can’t accurately derive the meaning of these activities by just thinking about them.  What kind of impact *is* it if someone tweets about a paper a lot?  Is it a titilating champagne giggle because the title was amusing, or a strawberry indication they were thrilled because someone just solved their method struggle?  We need to do research to figure this out.

Flavours are important for research outputs other than just papers, too.  Some publicly available research datasets are used all the time in education but rarely research, others are used once or twice by really impactful projects, others across a field for calibration, etc.  Understanding and recognizing these usage scenarios will be key in recognizing and rewarding the contributions of dataset creators.

Below is a concrete example of impact flavour, based on analysis that Jason Priem (@jasonpriem), Brad Hemminger, and I are in the midst of writing up for the soon-to-be-launched altmetrics Collection at PLoS ONE.  (Edited April 3 to add: manuscript is now up on arXiv)  We have clustered all PLoS ONE papers published before 2010 using five metrics that are fairly distinct from one another: HTML article page views, number of Mendeley reader bookmarks, Faculty of 1000 score, Web of Science citation counts as of 2011, and a combo count of twitter, Facebook, delicious, and blog discussion.

We normalized the metrics to account for differences due to publication date and service popularity, transformed them, and standardized to a common scale.  We tried lots of cluster possibilities; it seems that five clusters fit this particular sample the best.

Here is a taste of the clusters we found.  Bright blue in the figure below means that the metric has high values in that cluster, dark grey means the metric doesn’t have much activity.  For example, papers in “flavour E” in the first column have fairly low scores on all five metrics, whereas papers in “flavour C” on the far right have a lot of HTML page views and Sharing (blog posts, tweeting, facebook clicking, etc) activity.

Since this is a blog post I’ll take the liberty of indulging in a bit of unsupported extrapolation and speculation and give these flavours some names.  I also include the titles of three exemplar papers from each cluster:

flavour E: Not much attention using these metrics  (53% of the papers in this sample)

Remember these papers may be impactful in ways we aren’t measuring yet!

[1] “Synaptic Vesicle Docking: Sphingosine Regulates Syntaxin1 Interaction with Munc18”
[2] “Sperm from Hyh Mice Carrying a Point Mutation in αSNAP Have a Defect in Acrosome Reaction”
[3] “Role of CCL3L1-CCR5 Genotypes in the Epidemic Spread of HIV-1 and Evaluation of Vaccine Efficacy”

flavour B: Read, bookmarked, and shared (21%)

[1] “Vision and Foraging in Cormorants: More like Herons than Hawks?”
[2] “Tissue Compartment Analysis for Biomarker Discovery by Gene Expression Profiling”
[3] “Protein Solubility and Folding Enhancement by Interaction with RNA”

flavour A: Read and cited (20%)

[1] “Roles of ES Cell-Derived Gliogenic Neural Stem/Progenitor Cells in Functional Recovery after Spinal Cord Injury”
[2] “Bone Marrow Stem Cells Expressing Keratinocyte Growth Factor via an Inducible Lentivirus Protects against Bleomycin-Induced Pulmonary Fibrosis”
[3] “Immune Regulatory Neural Stem/Precursor Cells Protect from Central Nervous System Autoimmunity by Restraining Dendritic Cell Function”

flavour D: Expert pick (3%)

[1] “Hemispheric Specialization in Dogs for Processing Different Acoustic Stimuli”
[2] “The Oncogenic EWS-FLI1 Protein Binds In Vivo GGAA Microsatellite Sequences with Potential Transcriptional Activation Function”
[3] “Retinal Pathology of Pediatric Cerebral Malaria in Malawi”

flavour C: Popular hit (3%)

[1] “Genetic Evidence of Geographical Groups among Neanderthals”
[2] “Perceptual Other-Race Training Reduces Implicit Racial Bias”
[3] “Symmetry Is Related to Sexual Dimorphism in Faces: Data Across Culture and Species”

What do you think, do they look like they might be meaningful clusters to you?  They are certainly interesting, uncovering impact made by papers we keep in our personal libraries but never cite, and demonstrating we do indeed share papers that aren’t just “popular hits” for example.

It is worth noting: Flavour E, D, and C are quite stable in this dataset, whereas the center of the clusters for Flavours B and A change a bit depending on clustering algorithm etc.  The cluster analysis needs more altmetric components to tease out the more subtle patterns.  We don’t even touch the crucial step of correlating the clusters with observed behaviour to validate whether they do in fact have real-life meaning.

The goal of our analysis here is not to report the quintessential impact clusters — a lot more research is needed!  Instead, we hope it serves as an illustration of what it might look like to begin describing research impact with a full flavour palette…. and one of the reasons we are so excited about altmetrics.

Want to learn more about altmetrics?  Yesterday’s article in The Chronicle of Higher Education is a great place to start! :)

Edited to add links to Jason Priem (@jasonpriem) and Brad Hemminger, so you can easily follow them if you aren’t already!

Also updated the graphic to a nicer version, April 3, 2012.

17 Comments

  1. Heather et al. –

    This looks like a very cool and informative study, and one that I would very much like to see the results of published soon since it could play an important role in the upcoming UK REF exercise.

    You may be aware of a related study by Allen et al “Looking for Landmarks: The Role of Expert Review and Bibliometric Analysis in Evaluating Scientific Publication Outputs” (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0005910), which seems to show a similar trend that “expert review” does not correlate strongly with other quantitative article-level metrics. It would be interesting to merge your data with that in the Allen paper, but alas they do not make their data available as a supplemental file for re-analysis. The Allen article and an analysis by Wardle “Do ‘Faculty of 1000’ (F1000) ratings of ecological publications serve as reasonable predictors of their future impact?” (http://library.queensu.ca/ojs/index.php/IEE/article/view/2379/2478) also show that “expert review” misses many highly cited papers.

    One suggestion: what about citeulike posts? It would be interesting to see if Mendeley & citeulike give the same flavour or not.

    Can’t wait to see the final article and find out what “flavour” it will be.

    Best regards,
    Casey

    Comment by caseybergman — January 31, 2012 @ 9:39 am

    • Just to correct Casey’s summarization of the Allen et al. paper cited above. It actually says the following “At an aggregate level, after 3 years, there was a strong positive association between expert assessment and impact as measured by number of citations and F1000 rating.” And their Conclusion says: “When attempting to assess the quality and importance of research papers, we found that sole reliance on bibliometric indicators would have led us to miss papers containing important results as judged by expert review.”

      Comment by KW — January 31, 2012 @ 10:00 am

      • The actual results are as follows:

        “There was a positive correlation (rs = 0.45, significant at 0.01) between our reviewers’ assessments of the ‘importance’ of the research papers (as reviewed in 2005) and the papers’ use in the wider community as indicated by citation totals three years later (Figure 3). By the beginning of October 2008, 48 (7%) of the 687 original research papers assessed by our reviewers also featured on the two F1000 databases. Our expert review scores were positively correlated (rs = 0.445, significant at 0.01) with the assessments of these same papers on F1000 (Figure 4)”

        So in the results the authors do not claim for a “strong” correlation for either result, since a Spearman rank correlation of 0.45 with a P-value of 0.01 is indeed not “strong”. Also, I urge reader to actually look at the data in Figure 3 & 4 to see if these correlations are meaningful over the full range of impact, or driven by a few truly high-impact outliers. My view is that the quote from the abstract that you excerpt somewhat falsely over-emphasizes the true trend, and is not justified (or even made by the authors) when presenting their data.

        Comment by caseybergman — January 31, 2012 @ 10:58 am

    • Regarding the analysis of F1000 by Wardle, cited by Casey above, it’s important to be aware that the F1000 Ecology Faculty was launched in March 2005. Wardle analyzed only a few papers in a subset of sections from the Ecology Faculty from 2005 (i.e. a year that was not fully covered by this new faculty); thus, his analysis comprised only a subset of papers from a newly established faculty from an incomplete year of coverage.

      Comment by KW — January 31, 2012 @ 10:36 am

      • Good points. I agree the Wardle analysis is not as strong as the Allen et al. paper, but given that it was one of the few papers I’m aware of on the role of expert review as an “altmetric” I thought it was worth pointing out. KW, are you aware of others?

        Comment by caseybergman — January 31, 2012 @ 11:00 am

    • Thanks for these comments, Casey! We are working to submit to PLoS ONE in the next few days…. will also post results etc here as I finish writing them up….

      We are aware of these papers but I appreciate the heads up because it is so easy to miss things in this area.

      We do indeed have citeUlike data in our analysis, and it is correlated with Mendeley as you suggest. For the cluster analysis, therefore, we included just one of them to reduce collinearity worries. I’ll post more info soon.

      Comment by Heather Piwowar — February 1, 2012 @ 6:51 am

      • Interesting, glad to see the Mendeley & citeUlike data are saying similar things. Were you able to dig out any other papers evaluating the role of expert opinion? Best of luck with the submission!

        Comment by caseybergman — February 1, 2012 @ 7:59 am

  2. Similar to caseybergman, I wonder how any potential skew in the Mendeley user base affects which papers fall in “flavour B”? I know Mendeley gained early traction with the genetics / computational biology community, and 2 of the 3 example articles seem rather ‘omic’ to me.

    More broadly, I expect metrics based on social networks with sporadic coverage to ultimately fall flat due to biases inherent in the data source. Of course, one could attempt to correct for disparities between fields, but if the data simply isn’t there, then papers that should be falling into flavour B won’t show up.

    I guess one way to remedy this would be to collect a wider sample from all the ‘reader’ tools being used. It would be great to see Mendeley / CiteULike / Endnote etc. agree on a standard set of data to collect, anonymize and release publicly in order to allow better development of metrics like these. It would also ensure that these tools stay relevant in a world where more real-time metrics are being used.

    Comment by Greg — January 31, 2012 @ 12:25 pm

    • Greg, great comments. We do have some exemplars that are all within a given community (or at least paper topic keyword). Inspired by your comment I’ll try to highlight these in the paper + post here.

      We do see differences in service usage across the different PLoS journals, for sure, your point is well taken. It means that coming up with appropriate normalization is always going to be tricky. But that’s ok, not a show stopper, it is true for lots of other things (citation counts etc) too.

      Comment by Heather Piwowar — February 1, 2012 @ 6:57 am

  3. […] and rewarding the contributions of dataset creators.   source: Research Remix, Heather PiwowarVia researchremix.wordpress.com Like this:LikeBe the first to like this […]

    Pingback by 31 Flavors of Research Impact through #altmetrics « Research-Management In Management-Research [RMIMR] — January 31, 2012 @ 2:20 pm

  4. I think the way you are thinking in is absolutely spot on! Awesome!
    However, I’d suggest to use Cluster/Tree Analysis or Principle Components Analysis or some such multivariate methods to look at which metrics group together and which don’t. These should give you better ‘flavors’ and suggest new metrics complementing those you already have.

    Suggestions for new metrics: *derivative* metrics such as initial or current *rate* of citation, downloading, bookmarking, etc.For instance, average or median time between publication and citation, or between citations, either in the last year, or the first year after publication or overall. A good understanding of which metrics are predictive of which would be crucial for, e.g., using altmetrics as a predictive tool to alert researchers to important recent findings.

    Comment by brembs — February 1, 2012 @ 6:11 am

    • Great feedback. We do indeed have factor analysis and rule derivation in the full analysis, it just didn’t make it into this blog post :) We went with clustering on variables rather than the factor scores to make them more grounded, though admittedly has drawbacks.

      Yeah, we aren’t including any derivative metrics yet. I agree: much potential there. So exciting, so much work to be done!!!!

      Comment by Heather Piwowar — February 1, 2012 @ 7:00 am

  5. Just want to thank everybody for these comments and suggestions. Keep them coming!

    It is a great loop, isn’t it? Putting work out there -> comments+constructive feedback -> wanting to put more work out there asap :)

    Comment by Heather Piwowar — February 1, 2012 @ 7:02 am

  6. […] Piwowar, who is also working on alternative metrics, states in her blog post on ResearchMix, research impact now has flavours (and there could be as many as the 31 – the same as Baskin […]

    Pingback by What’s all the huha about? ‘Altmetrics’: uncovering the invisible in research — February 3, 2012 @ 2:46 am

  7. Great piece – but in the text above it says “Bright red in the figure below means that the metric has high values in that cluster, darker red means the metric doesn’t have much actiity”. Yet rathr unfamiliar looking Figure colour is blue. Could you please sub-edit to remove this potential confusion?

    Comment by Patrick Dunleavy — April 4, 2012 @ 5:32 am

    • Good catch, thank you! I swapped the figures but forgot to update the text. Done now, and hopefully on the LSE blog soon. Thanks again.

      Comment by Heather Piwowar — April 4, 2012 @ 5:47 am

  8. […] is an interesting new blog post by Heather Piwowar about the different ways research can impact the world and the importance of […]

    Pingback by Describing the Difference Research Has Made to the World « BioMed 2.0 — April 6, 2012 @ 12:55 pm


RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Blog at WordPress.com.