Friday, 30 March 2012

Short-cited IV: bibliometric comparisons

As part of my on-going attempt to get to grips with citation data, I attended a short lunchtime session on bibliometrics at the library today. Unfortunately, I did not really learn much more than I had from exploring my own ResearcherID and Google Scholar pages, which I think reflects the reality of the murkiness of bibliometrics more than my own thoroughness.

I had hoped to learn a bit more about Scopus, as this is apparently the bibliometrics provider that will be used as part of REF but, due to its expense, we do not have access at Southampton. (This begs the question as to how useful Scopus really is - if the average person cannot access it, what's the point?)

The introduction reiterated some key points that are worth repeating. The first is that no one source is (currently) ideal and so it can be sensible to use more than one. Moreover, the biases in Web of Science and Google Scholar are very discipline-specific. The Computer Science REF panel, for example, apparently uses Google Scholar instead of Scopus, as this better reflects their important citations. (Conference proceedings are much more important in Computer Science than biology, which is actually a bit of a barrier for computational biology bringing together computer scientists and bioinformaticians.) 

Beyond the citation data sources, there are also differences in citation profiles for different disciplines, depending on how active a field is (crudely, more researchers = more citations) and how rapidly ideas are taken up and/or can be applied (Adler R et al (2008) Citation Statistics).

This is clearly true for both the age of papers cited (the citation curve, top) and the number of publications cited. There is an issue within an issue here, as exemplified by the bottom chart: how do you define a field? When does biological science become life science? This is a particular issue for me, as bioinformatics often falls down the cracks: neither computer science nor experimental biology are appropriate comparators.

The upshot of this is that you should only ever compare like-with-like, i.e. the same metric using the same dataset and the same subject area. With this in mind, it does not really matter too much what you use; although the absolute numbers differ, the relative rankings of authors has been shown to be fairly consistent between resources. (I need to dig out the source for this!) This is, of course, only true if it is the citation stats that you are after. Personally, I link to my citation pages predominantly as a way to look at the citations, so I want completeness. I'm still not convinced that bibliometric scores are really useful for any kind of comparison beyond a bit of ego massaging, which brings me onto the final point of the introduction that is worth repeating: don't use metrics alone to make decisions. Any decisions. Ever!

The session did not go into the different citation metrics themselves but did highlight a few resources for calculating different statistics, although only using Google Scholar (with all its pitfalls) as a source. In addition to Publish or Perish, which I mentioned before, they drew attention to QuadSearch, the Mozilla Firefox Google Scholar add-in, and Scholarometer. As with the fancy Microsoft Academic Research stats and comparisons, however, "garbage in, garbage out". In particular, I am not sure that any of these tools give you the option to edit your Google Scholar data by adding or subtracting citations. (The cited work included, yes, but not the citations.) Publish or Perish certainly doesn't seem to and this is the one that the presenter thought was the best.

My take-home message: if there is a genuine desire by governments, 
league tables and employers to make use of citation data, they need to invest in making robust and accurate tools for collecting these data and policing them. (If I was disingenuous, I could easily "claim" some other R Edwards publications in my ResearcherID, Google or Microsoft profile and I suspect that not many people would notice.) Until that time, I am not sure that there is much we can do apart from learn how to best present our own citations on the one hand, while resisting their use on the other. (Never use metrics alone to make decisions!)

No comments:

Post a Comment

Thanks for leaving a comment!