Another day, another misguided graph on happiness research. This time Fast Company (tech populations are ground zero for happiness research for some reason) puts up the graph above. Which seems interesting, right?

Except that in the article we find this:

Some countries are significantly happier than others (happiness is, of course, subjective). Indonesia, India Mexico, and Brazil lead the pack in happiness, while Russia, South Korea, and Hungary are all pretty miserable (see the chart). There are other factors as well: People who are under 25 are most likely to say they’re “very happy”; Latin American countries as a whole have the most “very happy” people; and people with high income and extensive education are also most likely to report being “very happy.”

I got interested in how much the age question figures in, because just glancing at the big graphic I could see it looked almost identical to what these nations would look like if ranked by median age.

Turns out it probably figures in a lot. Here are the top five “happy” nations and their median age (from WolframAlpha):

Indonesia: 27.6 yr
India: 25.3 yr
Mexico: 26.3 yr
Brazil: 28.6 yr
Turkey: 27.7 yr

Here are the bottom five:

Italy: 43.3 yr
Spain: 41.1 yr
Russia: 38.4 yr
South Korea: 37.3 yr
Hungary: 39.4 yr

So, in other words, most of what we are seeing in the above graph may be attributable to age — not country at all. Young people say they are happier, countries with a lot of young people will therefore have higher reported happiness, which tell us… well, nothing except those countries are young demographically.

Is it the whole story? Well, probably not. But I have no idea why you wouldn’t control for median age in a graph like the one above.

I’ll also add that I think the sociolinguistics of “happy” are pretty difficult. Young people value happiness (and respond to polls accordingly). Older people, especially those with children, often see happiness as too thin a word for what governs their life — life is partially about sacrifice, a parent hitting a 5-point bubble on a Likert scale may see that as an indication of selfishness (rightly or wrongly). So even with the age difference, I’m not sure what happiness research is really getting at.


Students really don’t get randomness. This is the classic Trick Coin Flip question — I have a trick coin that either comes up heads a bit more than tails, or tails a bit more than heads [They sell trick coins both ways, apparently]. I don’t know whether this particular trick coin tends towards heads or tails, and I don’t know by how much.

We call the tendency of the trick coin to “tilt” results in one direction it’s bias. I have the trick coin in my hand. Which of the following would give me the best idea of the coin’s bias?

  • 10 flips?
  • 100 flips?
  • 1000 flips?
  • or, it doesn’t matter, all of these give you the same idea of the coin’s bias.

The results from class you can see above. More later on this.

Gallup 1946

I knew about the poll in 1936 that changed everything — where the two million responses collated by the Literary Digest were dead wrong while the 50,000 responses scientifically selected by George Gallup were right. If you need a Wikipedia refresher on that, here you go:

In 1936, [Gallup’s] new organization achieved national recognition by correctly predicting, from the replies of only 5,000 [sic?] respondents, that Franklin Roosevelt would defeat Alf Landon in the U.S. Presidential election. This was in direct contradiction to the widely respected Literary Digest magazine whose poll based on over two million returned questionnaires predicted that Landon would be the winner. Not only did Gallup get the election right, he correctly predicted the results of the Literary Digest poll as well using a random sample smaller than theirs but chosen to match it.

What I didn’t know was that the data he had collected on the non-response bias of that poll was still available. The chart above might make a good addition to a class on non-response bias, as it shows how non-response tends to exaggerate extreme values — in this case, anti-incumbency feelings.

The chart is from this article, which is worth a read. It also provides a chart dealing with the sampling bias issue:


Skewness — I think the idea a distribution has a shape is something that some students just don’t grasp, and I’ve never got a good grip on what it is that blocks them from understanding concepts like skew (they get outliers at least in the broad, conversational sense, but skew remains a mystery). The weirdest thing is you can actually have students make a histogram complete with counting and choosing bin size and graphing and the whole bit, and they still do not get at a deep level what a distribution is.

If anyone has some killer activities here, let me know. My sense is that a long exposure to bar charts tends to push them to view histograms as ordered but categorical data…

Chart above from Biostatistics: A foundation for analysis in the health sciences.

Simpson’s Paradox

Example of Simpson’s Paradox from The Numbers behind Numb3rs.

In this example„ women are accepted at a higher rate (or roughly equal rate) to all of Berkeley’s programs, but are accepted a a lower rate when those acceptances are combined into university-wide stats. Why? Because women apply to more competitive programs…

Predictive Efficiency

From Farrington & Tarling’s Prediction in Criminology, a new term: predictive efficiency. The way to think about it is this — suppose I say that a college education predicts low incidence of being convicted of a violent crime, and at the end of the day I’m right — over the course of a year, 97.5% of our college grads are not convicted.

In the absence of a base rate, that doesn’t really tell us anything. It can be a good predictor in that it does predict at high rates of certainty, but it’s inefficient compared to alternative predictors.

From Swing Voters via ilovecharts

This is a great example for students of how longitudinal measurement is sometimes used in polling to understand the effect of a specific event. The post-speech numbers alone tell us a bit about Obama’s popularity, but nothing about the speech. With a pre/post on the speech, we can use the post-speech gain to understand the speech’s effect.

Researching my health statistics class, and found this great walk-through of the issues of sensitivity and specificity in medical test design and interpretation. Clear, easy to read, and suitable for everyone.

Everyone that gets medical tests done or will get medical tests done (which, let’s face it, is everyone) should be familiar with this stuff, but it’s often hard to visualize. The author’s technique for representing this makes it wonderfully simple. Click through!


If you are interested in these issues, Kaiser Fung’s Numbers Rule Your World has a great discussion of how they relate to the sham that is steroids testing in sports.

The best book on health statistics I have read is Know Your Chances. For some reason there is a free PDF of it here. It’s short, you can read it in an afternoon, and it’s one of the most useful things you could spend your time on, honest.