Contributed by David Barner
Which currently active scientist has had the biggest impact on your life? This is a difficult question, and one that hinges upon what we mean by “impact” in the first place. You might think that the most impactful researcher works on cancer. Or perhaps physics. And if you guessed physics – and you’re a scientist – you might just be right. In fact, by one measure, which I’ll get into below, it’s a physicist at my own institution – UCSD – who has touched the lives of the most scientists. His work has affected the life of more or less every researcher in the tenure track, whether in physics, biology, psychology and beyond. And, as it turns out, it’s not because of his very respected work on physics that he’s had this impact. Also, although he’s touched the lives of many researchers, most don’t know his name – well, not his entire name anyway. Some people just refer to him as Dr. H. As in, Dr. Jorge Hirsch.
Jorge Hirsch is known among physicists for his work on topics including “superconductivity”, but is known by the rest of us for creating a metric that he coined the “h-index” (h as in Hirsch). In a PNAS paper that’s now been cited over 5000 times, Hirsch tried to solve the problem of how to measure the impact of a researcher’s work – a problem we confront when hiring new faculty, or deciding who should get promoted, accelerated, decorated with awards, or envied for the sheer glory of their scientific stardom. Previous efforts to do this were unsatisfying. Just counting the number of papers someone publishes is meaningless, since they might amount to a bucket of quickly churned duds. For a similar reason, counting a researcher’s total number of citations is inadequate. A hundred papers each cited once are surely different in importance to the field than 10 papers each cited 10 times, or one cited 100 times. But how to decide? Hirsch (2005) proposed a simple and highly conservative formula, the h-index, which he defined as “the number of papers coauthored by the researcher with at least h citations each.” To make this concrete, an author with 20 papers, 15 of which have been cited exactly 4 times, but 5 of which have been cited 5 or more times, would have an h-index of 5. To increase this number to 6, their next two papers would each need to be cited at least two more times each.
Hirsch’s index took hold almost immediately as the gold standard of productivity and impact in the sciences, and was quickly adopted by mainstream bibliography databases like Scopus and Web of Knowledge to measure impact across disciplines. The measure really took off in 2011, when it was adopted by Google Scholar, where scientists can view their own index and those of willing colleagues who have created a Scholar profile. Now the h-index is found on CVs, cited in promotion files together with plots of citations over time, and anxiously monitored by junior faculty approaching tenure as they ponder their place in the academy.
For the moment, the field has reached a point of stasis in its measurement of impact. We have a tool that accounts for the number of papers a researcher publishes, but that rewards both this and the impact of each of those individual contributions. It’s a system that is difficult to game by publishing in volume (unless, of course, the database used to compute the h-index includes self cites, as does, for example, Google Scholar). And it doesn’t disproportionately reward individuals who started off with a bang in grad school – publishing a Science paper with a team of famous advisors – but then fizzled in their own lab. In fact, one might say that this metric is somewhat unfavorable to Hirsch himself, since nearly a quarter of his 21,000 citations are to a single paper on the h-index (though he’s still quite the star with an h of 62).
Still, it is reasonable to ask whether we should be happy with this measure. Does it reflect our impact as scientists? Should it be used to compare individuals? Is it worth our own individual energies to worry about the citation counts of our papers, and the papers of our colleagues?
How good is the h-index at measuring “impact”?
A first worry that we might have is whether the h-index actually predicts scientific achievement. As Hirsch pointed out, this question is itself fraught. We might think that citation rate is an imperfect indicator of impact. A finding that does not generate substantial follow-ups might nevertheless yield important technological innovations or cultural changes outside the academy (see below). And papers that are known to report false, fraudulent, or otherwise notorious results may get cited without signaling any particular merit in their authors. For example, the now retracted paper by Andrew Wakefield (h=49) that falsely linked the MMR vaccine to autism has been cited over 2100 times. Hardly grounds for promotion.
But speaking more broadly, the impact of individual papers may not reflect the value added to the research community as a whole, or, for that matter, the conditions created by a research community that made the work possible in the first place. Big ideas and innovations rarely come out of the blue, but culminate from the contribution of many individuals working collectively on a problem – like the crest of a wave, pushed up into the light by the movement of the swelling waters below. Despite this, ideas that are “in the air” nevertheless get credited to whomever publishes first, or in the most visible outlet.
This question – What is impact, and why should we care? – is important. But let’s just play along for a moment, and see how far the h-index can get us. Having acknowledged that the notion of “scientific achievement” is a slippery one, Hirsch decided that a metric of citations and productivity is probably not a bad place to start, so long as it is a valid measure that makes predictions about the future – who will make a good hire, and who will thrive after tenure. In another paper published in PNAS Hirsch asked whether the h-index was predictive of future scientific achievement defined in this way, and how it compared to other measures. He showed that, relative to other metrics, like average citation number, for example, the h-index was much more likely to predict later citation counts (r=.60), and more likely to predict its own later value (r=.61). In other words it was more reliable and less subject to random outside forces than other metrics available at the time. That’s not to say that other, possibly better metrics haven’t arisen since. For those who enjoy a headache, there are now many others on offer (around 40 as of 2009), including the h(2) index, the m quotient, the g index, the a index, the r index and ar index, the hw index, and so on. And there are many, many, many, papers that have taken the time to meticulously compare these different metrics. But overall, it looks for now like the h-index is here to stay.
Who should we compare with the h-index?
One serious problem that arises in the use of the h-index is that it fails to account for the fact that in some fields, researchers work in groups, making it hard to identify who merits credit for which work. To solve this, Hirsch proposed another tool, the hbar index: “I propose the index ℏ (“hbar”), defined as the number of papers of an individual that have citation count larger than or equal to the ℏ of all coauthors of each paper, as a useful index to characterize the scientific output of a researcher that takes into account the effect of multiple co-authorship.”
But another more serious issue is how to compare people who do different kinds of work, and whether the h-index can be used to compare individuals across disciplines. In an interview with Nature, Hirsch noted that, “Different disciplines have different citation patterns… so each field would need different thresholds. Biologists can have h values of up to 190. But with that proviso, the method should work across disciplines.” What Hirsch seemed to be saying was that we shouldn’t blindly compare h scores across different disciplines of science. But why not? One might argue that this is just as things should be: Biologists study important things, and so if biologists have higher h’s than, say, mathematicians, this is maybe just a reflection of their true impact and value to society.
As it turns out, citation practices are a little weirder than one might expect – at least weirder than I expected. In a paper published in a journal called “Scientometrics”, Juan Iglesias and Carlos Pecharromán showed that disciplines differ in fairly substantial and surprising ways. Here’s a figure I recreated from their data plotting the average number of cites papers got across 21 disciplines between 1995 and 2005.
What I found surprising was that there was relatively little relationship between my intuitive sense of the “usefulness” of a discipline (e.g., to people outside it) and the average citation count generated by said discipline. Fields like computer science, economics, business, agricultural science, and material science were all on the south side of physics, while one of the most citation-frenzied fields was neuroscience & behavior (what have you done for me lately?). My point here is not to dismiss the value of pure research, but instead to point out that the number of citations a field generates may not be the product of a natural law, whereby work that is inherently important attracts the most cites. If computer scientists got together and started publishing (and citing) more papers instead of creating code for the masses, we’d hardly conclude that the impact of their work – at least broadly construed – had increased. I think there is good reason to conclude, as Hirsch did, that a metric like the h-index should be used to compare individuals to others within his or her discipline, and not against data from all disciplines taken together.
A first pass solution to this issue is to take the data presented by Iglesias and Pecharromán and to create a normalization factor for comparing h-indexes across disciplines. And in fact this is exactly what they did (see Table 2, p. 313). Specifically, they computed a factor, f(i), for each discipline by “calculating the ratio of the number of citations/paper for each field… and normalizing to the corresponding values for the field Physics” (which was assigned a value of 1). Given this set of values, it becomes possible to place h-indexes from different disciplines onto a common scale: multiply the h-index by f(i). For example, to compare Hirsch to a hypothetical neuroscience doppelganger, we’d multiply his already impressive h-index by 1.41, to get an h of about 87. Alternatively, if we found a neuroscientist with an h of 62 and wanted to know how they compare to researchers in physics, we’d multiply their 62 by .71, to get an h of about 44.
Frankly I find this game a little bit silly – How can we really compare the scientific careers of physicists, psychologists, and mathematicians: These are apples, oranges, and, well, some really abstract apples. At best these functions might be used to orient campus-wide committees to differences in citation norms across disciplines. Perhaps more interesting, the same type of math allows us to compare individuals within a discipline but who work on different topics. After all, the problem of comparing individuals across areas arises not only in campus-wide assessments of tenure files, but also within departments with diverse faculty. In a psychology department, for example, researchers take a variety of different approaches to studying diverse phenomena, ranging from neuroscience, animal behavior, social psychology, language processing, memory and attention, to child development, inter alia. Each of these sub-areas forms a type of clique, with its own conferences, its own journals, and its own unique citation practices. How to compare individuals across these groups? A particularly ingenious tool for doing this was developed by a group of scientists at Indiana University, under the name of Scholarometer. Scholarometer, described in detail in a 2012 paper in PLoS One, is a crowdsourced database that allows users to search for authors, tag them according to their sub-area of research, and then compare the author according to others who fall under that tag. For example, this would allow comparing a developmental psychologist who studies language acquisition – like me – to other psychologists, other developmental psychologists, other language acquisition researchers, or, for fun, to physicists.
The final question I want to address is whether we should really be worry about citation counts in the first place – whether it’s worth the energy. What I’ve learned from scanning this literature is that citation counts, taken alone, don’t seem to be all that predictive of how important an area of study is. Disciplinary cultures play as strong a role as anything. What the h-index does best is differentiate individuals when they study roughly the same thing. And this, in turn, is really only useful for purposes of hiring and promotion (since hopefully we don’t seek out scientific papers on the basis of the h-indexes of those who wrote them).
There are three considerations that lead me to think this is not worth the energy.
First, in almost all cases, the h-index will be useless for the purpose of hiring, due to the small sample sizes involved. Even star candidates will have tiny hs, and will differ by maybe 1 or 2 points at best.
Second, the h-index is most useful in absence of other more direct measures of a researcher’s contributions. It may be interesting to note that a tenure candidate has an above average or below average h, but it would seem nothing short of nuts to conclude anything more than this from the number, given the availability of the candidate’s actual papers, their referee letters, their funding track record, teaching evaluations, etc. The information that an h is trying to give a proxy for is all there in the file. At best the h can be quantitative ammo to buttress the case against a weak candidate. But it’s hard to imagine a case hinging on an h. If a committee isn’t able to evaluate a file without this number, there’s reason to question whether they should be evaluating it at all.
A third and final consideration is one that I stumbled upon accidentally, and which led me to write this piece. Everyone who has submitted a paper for publication has experienced the reviewer (I believe it is Reviewer #2), who kindly points the author to a long list of papers almost certainly written by said reviewer. In one recent case of this, I was dragged through several pages of ugliness for not having cited a paper which, to that point, had never been cited by anyone (including the paper’s own authors). And honestly, that’s not the worst, but I’ll spare the details. These cases got me to wondering why people cared so much about their work being cited.
First, mea culpa. I’ve asked people to cite my work in the past, in the role of Evil Reviewer #2. And now I think this was a mistake. But still, I wonder what motivated me – or more generally *us* to make these requests.
Perhaps it is true that as reviewers we are likely to be leading experts and thus have papers that merit inclusion in every paper on a particular topic. To this, there is a very simple fact to consider: If your work really is important and merits inclusion, the author has two other reviewers that are able to chime in and make such suggestions. If they do not, it’s quite possible that the work isn’t all that central after all.
Or perhaps we feel slighted at the idea of our work being neglected, of being bullied by more callous souls, or of being left out of the literature – perhaps a literature we helped to create. In other words, we seek the impact that being cited confers upon us.
Here is my last piece of data to chew on, as you consider this last question. Together with a lifetime h-index, each researcher’s Google Scholar profile also presents a h for the last 10 years. One complaint that many have made of the h-index is that it can only go up, such that even very unproductive senior professors can have h-indexes much higher that roaring junior colleagues. Google’s 10 year h is nice because it gives a glimpse of how things have been going more recently. And this is where one’s ego gets the medicine it needs.
At the beginning of the piece I asked you to think of some influential scientists. So, let’s think of a psychologist – how about BF Skinner, of behaviorist fame? Skinner remains one of the most famous psychologists of all time. He helped create a new science of memory and learning whose observations live on today in the form of new science and technology, infants’ sleep regimes, and slot machine psychology. His h-index as of today is 93 – an amazing number for a researcher who published at a time when citations were much less frequent than today. Still, what’s sobering is that Skinner’s 10 year h-index is quite unremarkable – just over 50 – a number that does not differentiate him from most late career psychologists currently active – and which is lower than that of Hirsch – the most famous scientist you didn’t know you knew.
This said, there’s reason to believe that Skinner has hung on much longer than most of us will. In a paper called “Citations, Age, Fame, and the Web”, William Landes, a Professor at the University of Chicago, and Richard Posner, Chief Justice of the US Court of Appeals for the Seventh Circuit, notice something quite sobering. As a junior researcher, call them “i” publishes papers, they add to their “scholarly stock” at a particular time “t”. Landes and Posner labeled this the researcher’s K(i)(t). They then noted that, “It follows that i’s citations will tend to increase yearly over his lifetime provided his capital stock continues to grow (that is, provided new investment exceeds depreciation) and will fall off once depreciation exceeds new investment, the decline accelerating once he ceases (because of retirement or death). Death may be an inflection point if citations depend not only on the perceived quality or relevance of the work cited but also on personal friendship, a desire for personal advancement by flattering an established scholar, or other factors that cease with death.” They then continue by asking us to imagine “that a scholar began his scholarly career in 1900, retired in 1940, and died in 1955, that his work was cited 100 times every year until his retirement, and that thereafter his scholarly capital depreciated at an annual rate of 10 percent…. depreciation will have a powerful effect on his total citations in the 1956-98 period. By 1956 he would be receiving only 20 citations per year and by 1998 less than one-half of one citation per year.” May his h-index rest in peace. And may ours too, should we be so lucky as to decline by only 10% a year.
My point here is not an exercise in nihilism. Instead, I hope to convince you that impact is something a little bigger than citations – that a researcher’s impact does live on through their science, but not in the form of cites, or even memory of old classic papers. The behaviorists had a huge impact on psychology even if their citation counts are down, just as their replacements – in the form of cognitive psychologists of the 1960’s are currently seeing their paradigm shifting papers eclipsed by the latest Psych Science papers on how color perception is affected by grumpiness (it’s not), or the social psychology of how opinions can be changed (they can’t). The cognitive psychology of the 1960’s lives on, however garbled, in these papers, and throughout psychology, even if the papers that spawned our current methods and ways of thinking about the mind and brain do not. And if your ideas are good, they’ll probably live on too, even if you, your name, and your citations don’t. Counting citations – and squeezing them out of peers to boost your h – just isn’t worth the energy.