Sunday, 21 June 2015

The importance of knowing how your data are scaled

A few weeks ago, there was a post on WEIT, The correlation between rejection of evolution and rejection of environmental regulation: what does it mean? It was triggered by a tweet about by the Washington post about a graph comparing attitudes to the environment and attitudes to evolution, broken down by religious affiliation:

We’ll get to the tweet later. First, the graph. It was from a US National Center for Science Education blog post based on 2007 data from the Pew Religious Landscape Study, examining two binary choice statements:

y-axis. Stricter environmental laws and regulations cost too many jobs and hurt the economy; or Stricter environmental laws and regulations are worth the cost.

x-axis. Evolution is the best explanation for the origins of human life on earth. (Agree/disagree)

Data was normalised onto a percentile scale with each circle representing (1) by position, the normalised percentile of that group’s response, (2) by area, the size of that group. (36,000 people were surveyed in total.)

The percentile normalisation method was based on a previous analysis of different Pew questions by Toby Grant, who explains it thus:

Geek note on measurement

The range of each dimension ranges from zero to 100. These scores were calculated by calculating the percentage of each religion giving each answer. The percentages were then subtracted (e.g., percent saying “smaller government” minus percent saying “bigger government”). The scores were then standardized using the mean and standard deviation for all of the scores. Finally, I converted the standardized scores into percentiles by mapping the standardized scores onto the standard Gaussian/normal distribution. The result is a score that represents the group’s average graded on the curve, literally.

A few things annoy me about this:

  1. This is not simply a “Geek note”. Knowing what was done to data is vital for understanding what a plot means. To be fair to Grant, he does mention that he is plotting percentiles in the graph legend. (As far as I can see, Robineau does not mention it anywhere!)
  2. By first normalising to the mean and then converting everything to percentiles, there is a double loss of quantitative information. Following the first normalisation, all you can do is compare groups - there is no absolute information about responses. Following the second, you cannot even compare the degree of difference. What this plot is basically doing is pulling in the outliers to make them look more similar to mean, and spreading out those similar to the mean to make them look more different.
  3. When converting to percentiles, the additional normalisations seem pointless. Unless I've misunderstood, if the data is truly normally distributed then the percentile of the fitted data should be the same as the percentile of the raw data. If not, you shouldn’t do the normalisation in the first place. Either way, I think you are just adding error and confusion. (There is no data presented to support the fact that these opinions are normally distributed.)

It is also worth noting that, to the unwary, the circle sizes could be misleading. The bigger the circle, the more data and the more accurate the estimation of the value. The small circles might have much more random sampling bias in their positions. (Under a null model where all groups are the same, you would expect the large circles to gravitate towards the mean, while the smaller circles should be the outliers.) Most importantly, circles that overlap are not more similar than circles that do not.

It would be more useful to have estimated standard errors plotted for each group. Again, because we have lost the quantitative information, we cannot tell whether a small difference in responses (possibly within measurement error) would have a big difference in percentiles. There are 36,000 people in total but some of the groups are less than 0.5% and therefore have fewer than 200 people.

Robineau’s plot uses the same method although he:

“didn’t rescale to the 0-100 scale, since I didn’t want this to seem like a percentage when it isn’t.”

It's not a percentage but it is a percentile, so 0-100 is entirely appropriate. Leaving it as -1.0 to +1.0 is in fact very misleading, as it implies that people are positive or negative with respect to the questions. In reality, positive just means “above average” and negative is “below average”. I have an above average number of arms: two. This does not mean that I have lots of arms, it just means that some people have fewer arms than me.

These things aside, Robineau asks:

“So what does this tell us?

Thanks to the scaling, the only thing this graph tells us is that (a) there is a rank correlation between the answers to the two questions, and (b) some religious groups (particularly evangelical Christians) appear to agree with these statements less than average, while other groups (notably non-Christians) tend to agree with these statements more than average.

These observations could still be of interest. The real problem comes when people start interpreting this graph as if the normalisations and rescaling have not been done to it. Robineau first:

First, look at all those groups whose members support evolution. There are way more of them than there are of the creationist groups, and those circles are bigger. We need to get more of the pro-evolution religious out of the closet.

Second, look at all those religious groups whose members support climate change action. Catholics fall a bit below the zero line on average, but I have to suspect that the forthcoming papal encyclical on the environment will shake that up.”

This in turn was apparently interpreted by the Washington post to mean this:

The fact is, the normalisation has removed all hope of actually knowing whether there is conflict or not. The percentile scaling removes almost all of the quantitative info on the axes, so proximity on the scale means nothing with respect to proximity of answer. All the groups inside the small top right cluster could have >90% support for the scientific evidence and all of the groups outside <10% support, and you could still get that plot. (It’s hard to tell but the top-right cluster look closer to 1.0 than the bottom-left groups are to -1.0, indicating that they might deviate much more from the mean thanks to the mapping onto a normal distribution. This implies that the data was not normally distributed in the first place and is probably a heavy-tailed or bimodal distribution instead.)

Critically, it is impossible to conclude that any groups “support evolution” or “support climate change action”. As the graph is scaled by percentiles, 0.0 is essentially the point where 50% are above and 50% below. Because the vast majority of groups are religious, of course there are many religious groups above the line. There essentially have to be, unless all religious groups were identical (in which case they would group very slightly below 0.0).

To many, stand-out thing is that atheists and agnostics are all in the top-top right. This graph could easily have been branded “the conflict between science and religion in one chart”! But it cannot even really say that: every group could disagree with the two statements and thus be in conflict with the scientific evidence. You would still get the same plot after the rescaling.

My big question from all of this is: why not make the plot using the raw percentage responses? What do the normalisations actually achieve?

And my big take home message: if you are going to infer things from plots, make sure that you understand how the data were scaled.

No comments:

Post a Comment

Thanks for leaving a comment!