Monday, 5 March 2012

A short-cited look at bibliometrics with Web of Knowledge

A few things have got me thinking more about citations recently. First, the dreaded "Research Excellence Framework" is looming, for which each academic in the UK has to submit their four "best" papers. "Best", of course, is a rather subjective issue. Although journal Impact Factors and citation data are not meant to be taken into account for REF, the number of times that a paper has been cited is one indicator that can help determine how your papers are received. The second thing was that I was recently looking through a bunch of CVs as part of a recruitment process and the importance of maximising your perceived impact was clear.

I've had one eye on these issues for a while, so I have had both my ResearcherID and Google Scholar publication metrics linked from my website (and this blog) but never really thought too hard about either. My assumption was that the ResearcherID metrics, provided through Thomson Reuters Web of Knowledge, would be the better metric provider, as it's an "official" supplier of citations and is manually curated. Google Scholar, on the other hand, is more automated and has a tendency to over-inflate citations by including stuff that might be weeded out by more careful citation monitors.

Looking into things a bit more, though, the ResearcherID metrics do not appear to be as trustworthy as I had assumed. The problem with Web of Knowledge is that (a) they only include stuff that's indexed in ISI, and (b) for a manually curated citation index there seem to be a lot of mistakes that cause citations to be lost. As a result, although libraries seem to prefer Web of Knowledge, the citation metric calculator "Publish or Perish" uses Google Scholar. The publications of mine that I've looked at certainly back this up. One, for example, has ZERO citations on Web of Knowledge but Google Scholar lists three perfectly acceptable (in my mind) peer-reviewed citations.

It's not all positive for Google Scholar, though, as the criticisms levelled at it are also valid. Although Google are the kings of automated searches and returning relevant data, they are not flawless and I have noticed the odd duplication here and there. I have not checked yet (as the numbers I've checked are bigger) but I would not be surprised if there were also some citations missing; I know this has been an issue for some colleagues. The other problem is that, unlike ISI, there is little or no filtering of the types of citations returned. Going to the other end of the spectrum, and looking at my most highly cited paper, Google Scholar had added 15 citations, including the PDF manual of one my software packages. (There's a lesson in citation-inflation there, I think!)

This issue of over- and under-reporting of citations is not new and has been reported but I had never realised the extent of the under-reporting before. Furthermore, a couple of the other extras returned by Google are less cut-and-dried with respect to their "inflation" status. One was a doctoral thesis, which is a genuine peer-reviewed publication and I would consider a real citation. Indeed, including theses could be one of the biggest assets of Google Scholar, for it is normally hard (or impossible) to find out about theses that cite your work unless it is followed up by a paper. As a result, not only is the citation a useful discovery but potentially the thesis itself. Another two were foreign language (i.e. not English) publications, which again might well be perfectly valid. (Not speaking Polish, I cannot tell in this instance.)

Then, there is Microsoft Academic Research, although this seems to inflate things even more as it includes extra publications belonging to other people - at the moment, anyway. I've created a LiveID account and done a bit of cleaning up of my publication list, so it will be interesting to see what it says after that. (I now have a couple of publication missing but I am not sure which ones.)

So, which to use?! Currently, my feeling is that none of them are perfect. For me, ResearcherID is a definite underestimate but, at the same time, the extra 96(!) citations from Google Scholar are not all valid. It makes a difference - my h-index goes up by 2 with Google versus ISI - but it would be just as bad to be perceived as inflating my citation metrics as it would be to under-sell myself. The only real solution at the moment is to provide both metrics (and maybe Microsoft too, if that is different again) and keep an eye for one that allows editing of both publications and citations. (I don't think any of them currently have this function.) In the long run, though, I have a horrible feeling that I'll have to compile the genuine citations from the different sources myself. If nothing else, it will settle the question of which is best - for me, at least.

No comments:

Post a Comment

Thanks for leaving a comment!