Students really don’t get randomness. This is the classic Trick Coin Flip question — I have a trick coin that either comes up heads a bit more than tails, or tails a bit more than heads [They sell trick coins both ways, apparently]. I don’t know whether this particular trick coin tends towards heads or tails, and I don’t know by how much.

We call the tendency of the trick coin to “tilt” results in one direction it’s bias. I have the trick coin in my hand. Which of the following would give me the best idea of the coin’s bias?

  • 10 flips?
  • 100 flips?
  • 1000 flips?
  • or, it doesn’t matter, all of these give you the same idea of the coin’s bias.

The results from class you can see above. More later on this.

I want to do this in a class….

What a neat way of combining two textbooks to get a novel course design (which meshes with current theories of interleaving):

In an effort to maximize spacing and encoding variability, Robert Bjork once taught an honors introductory psychology course twice in one term. Up to the point of the midterm, the basic concepts of introductory psychology were covered using a textbook that adopted a history of psychology approach and emphasized the contributions of key individuals in the history of psychology, such as Pavlov, Freud, and Skinner. After the midterm exam, the basic concepts were covered again, this time using a textbook that adopted a brain mechanisms approach. The goal was to have key concepts come up in each half of the course (spacing) and from a different standpoint (variation).

From here.

Divided Attention During Lecture

I’ve been having some fun reading Bjork and his followers on elements of instruction. It’s good stuff! This comes from Successful Lecturing: Presenting Information in Ways That Engage Effective Processing by  Patricia Ann de Winstanley & Robert A. Bjork:

In addition to its having a strong negative impact on encoding, divided attention has been shown to have much larger effects on direct, or explicit, tests of memory than on indirect, or implicit, tests of memory (MacDonald and MacLeod, 1998; Szymanski and MacLeod, 1996). The implication is that divided attention during a lecture may leave students with a subsequent sense of familiarity, or feeling of knowing, or perceptual facilitation for the presented material but without the concomitant ability to recall or recognize the material on a direct test of memory, such as an examination. As a consequence, students may misjudge the amount of time needed for further study.

Dividing students’ attention during a lecture therefore poses a double threat. First, information is learned less well when attention is divided. Second, one’s feeling of knowing or processing facility remains unaffected by divided attention, which may result in the assumption that information is learned well enough and no further study time is needed (see Bjork, 1999, and Jacoby, Bjork, and Kelley, 1994, for reviews of the literature on illusions of comprehension and remembering).

Concept Inventories and Dan Meyer’s Linear Modeling Exercise

I’ve talked a bit in the past about good concept inventory questions — questions that address difficult conceptual questions but have black and white answers and don’t require any special vocabulary to answer.

Dan Meyer’s Linear Modeling exercise [PDF] is a good example. The first question has a specific answer, and answering it requires the right set of intuitions about linear processes, but it doesn’t matter what terms you are using, and the student does not need to intuit what you are trying to assess to get it right. 

I’ll add the exercise has one other mark of a great inventory question — apart from the title, it contains no hints that this is an application of linear modeling. This jives with what we know from processes like interleaving — that the decision of which model to apply is as important as the model itself. 

One final thing — I can’t help to notice that like many ConceptTests and like many questions on the FCI it is a prediction question. There’s something very powerful about prediction in the way it focuses the mind. More on that later.

Comparing Electoral Behavior

From the Utne Reader, in an article showing that we ” are segregating [our]selves politically and geographically” in the U.S. :

In 1992, 38 percent of Americans lived in counties decided by landslide elections; by 2004, that figure was 48 percent.”

One thing that jumps out at me immediately is that elections are very hard to compare to one another. In this case, 1992 represented the unseating of an incumbent in a three-way race (remember Perot?) — whereas 2004 was the two-way reaffirmation of an incumbent president with no real third party presence. 

How might this affect things? Well, let’s say we define a landslide as getting 60% or more of the vote. In a three way race (like 1992), that would be difficult. Clinton won the election with 43% of the vote (to Bush’s 38, and Perot’s 19). Assuming some deviation off of that average from county to county, you are still unlikely to get a 60% landslide — even a county 50% more Democratic than the average county is still barely breaking the landslide barrier (0.43 * 1.5 = 0.64).

In a two-way race the dynamics are different. In the Kerry/Bush 50/50 split, a candidate that wins in a 50% more Democratic county is going to win by 75%.

The second problem (which we ignored in the above calculation) is that counties are not a good unit of measurement. The majority of counties are small, Republican entities, even though voters are roughly split nationwide (Democrats live in more populous counties). I imagine too, that because the majority of counties are Republican that Republican wins will look polarizing (look at all those deep red counties!) whereas Democratic wins will look less polarizing (the counties that go dark blue will be fewer, but more populous, whereas the Dem votes will eat into the red counties). 

In any case, why not compare something more comparable — like 1984 and 2004? That has problems, but a whole lot less I think. Or compare mid-terms, where the national politics is less confounding. 

Fireside Tutorials and Punk Economics

What do we call this genre of videos, these informal explanations by Khan Academy, RSA:Animate, Common Craft, Vi Hart, and others — these sit across the desk from you and talk things through? I have no idea. But I’m fascinated with the form, and how rethinking video this way makes a lecture seem more like tutoring — even when (as in the case of RSA:Animate) the material is often an adapted lecture.

Anyway, this is my most recent find — Punk Economics, by David McWilliams:

If you have other great examples, shoot me an email or post in the comments.

Obesity and C-Section StatLit Materials

Obesity and C-Section StatLit Materials

Some stuff from Thursday’s class. Here’s the facilitator’s notes as well, if you want to run this in your own class.

It’s a sort of “case-study lite” approach. I gave the students the following in a packet: 

  • An article talking about research which showed people born by C-section are at a 50% greater risk of obesity than those that weren’t, and speculating C-sections may be behind the obesity epidemic
  • An abstract of that research study
  • A chart showing growth of C-sections since 1970
  • An article talking about why C-sections have increased since 1970 (It’s not for the reasons you think). 
  • A chart showing the growth of obesity over since 1970
Their role was described as follows:

For this scenario, you will play the role of an obesity researcher who has been asked by a hospital to see if there are extraneous variables in this study that were not accounted for. The hospital is trying to decide whether they should include the following sentence in their materials on C-section:

“Choosing to deliver your child by C-section may increase your child’s risk of future obesity.”

The instructions were: 
  • Produce a brief predictor-outcome statement. What is predicting what? How is it measured? What is the magnitude and direction of the association? 
  • Using the charts, produce a statement on whether the U.S. gains in childhood obesity roughly mirror the growth in C-sections.  
  • Produce a statement on whether the base rate of C-sections is large enough to have the suggested impact on obesity. 
  • Produce a list of all potential lurking variables controlled for. 
  • Produce a list of some lurking variables not controlled for. 
  • Give your gut-level take on whether any of the potential lurking variables not controlled for might dramatically reduce the magnitude of the association, or potentially reverse it. If you believe one of the lurking variables could do that, name it, and explain why it might account for the apparent association.

As with many simulations, we’ve loaded the dice a bit here. The news articles we used had reference to some confounders, but we removed those references. There is a very obvious lurking variable, one which was later controlled for in a subsequent study which found no effect of C-section on obesity.

The background information on the growth of C-sections holds the key to understanding what’s going on. In that article, it explains much of the growth of C-sections is due to overweight mothers — expecting mothers who are overweight often have to have C-sections due to obesity or obesity related illness. 

That’s a classic lurking variable scenario. Overweight mothers will be over-represented in the C-section group due to medical reasons. Due to genetics, overweight mothers will also have overweight children at a higher rate than mothers not overweight. So it is completely expected that the C-section group would have a larger percentage of children who grow up to be obese.

The one mistake I made running this was to run it too slow, and without stages. I would suggest you budget at LEAST 45 minutes for this activity, and rather than have the student groups report out all the questions at the end, have them report out the answers to the first half of the questions, then give them some more time to put together answers for the second half.

I ran this in a much more compressed time frame, and one of my eight groups got it, and another very nearly got it — which wasn’t bad. But ideally you’d have at least half the groups get in the vicinity of the answer (or produce other, equally compelling answers). 

If you try it, tell me how it goes.

Hill’s nine criteria for causal association

Sir Austin Bradford Hill’s classic article on the characteristics of a causal relationship is well worth a read, and is still one of the most concise lists of what to look for in any research you read. Here’s a summary of what helps us make the leap from association to causation:

  1. Strength (is the risk is large)
  2. Consistency (the results have been replicated, by different researchers in different situations)
  3. Specificity (the predictor is not related to a broad array of outcomes)
  4. Temporality (predictor always precedes outcome)
  5. Biological gradient (also known as a dose-response: the more predictor involved, the more the outcome is involved)
  6. Plausibility (there is a plausible mechanism — we have a credible theory of how the causal relationship might work)
  7. Coherence (the association is consistent with the history of the disease)
  8. Experimental evidence (experimental interventions show results consistent with the association)
  9. Analogy (there are  similar results that we can draw a relationship to)

It’s worth noting that, as Fung points out in Numbers Rule Your World, there’s an awful lot of situations where we don’t need causality. You can work with strong association in places where you only need to predict (insurance rates, at-risk determinations), and rely on causality only when you have to determine effective interventions. 

The biggest problem I find with students and causality is not that they over-assign causality to situations, but that they see causality as a binary concept. In the minds of many students, there are two buckets — “caused” and “not-caused”. The idea that one association is more likely to be causal than another, that it is probably more likely that diets high in animal fat increase heart disease risk than it is that coffee cures Alzheimer’s, but that neither of these are proved beyond a doubt sort of escapes them — causality is seen as a finish line that is crossed, usually once and for all. 

Gallup 1946

I knew about the poll in 1936 that changed everything — where the two million responses collated by the Literary Digest were dead wrong while the 50,000 responses scientifically selected by George Gallup were right. If you need a Wikipedia refresher on that, here you go:

In 1936, [Gallup’s] new organization achieved national recognition by correctly predicting, from the replies of only 5,000 [sic?] respondents, that Franklin Roosevelt would defeat Alf Landon in the U.S. Presidential election. This was in direct contradiction to the widely respected Literary Digest magazine whose poll based on over two million returned questionnaires predicted that Landon would be the winner. Not only did Gallup get the election right, he correctly predicted the results of the Literary Digest poll as well using a random sample smaller than theirs but chosen to match it.

What I didn’t know was that the data he had collected on the non-response bias of that poll was still available. The chart above might make a good addition to a class on non-response bias, as it shows how non-response tends to exaggerate extreme values — in this case, anti-incumbency feelings.

The chart is from this article, which is worth a read. It also provides a chart dealing with the sampling bias issue:

Problems of Definition: Elsevier’s Prices

The recent boycott of Elsevier provides us with a great quote for use in a statistical literacy class. People are boycotting for a number of reasons, particularly because of the high cost of the “bundles” Elsevier sells.

Claiming that their journals are some of the cheapest in the industry, an Elsevier rep states:

“Over the past 10 years, our prices have been in the lowest quartile in the publishing industry,” said Alicia Wise, Elsevier’s director of universal access. “Last year our prices were lower than our competitors’. I’m not sure why we are the focus of this boycott, but I’m very concerned about one dissatisfied scientist, and I’m concerned about 2,000.”

Form the perspective of definition of terms, this may initially seem pretty straightforward, but it’s anything but. What does “our prices” mean?

  • Mean or median price computed by total offerings? In which case Elsevier could offer hundreds of free and worthless journals that no one uses or orders individually. This would pretty handily offset higher priced offerings.
  • Mean or median price computed by individual sales? This would be a good measure — because it only counts the journals people use, and doesn’t count the junk they carry. But it is impossible to compute this number this way because of their practice of bundling.

This last point is pretty important. Imagine you have two cable companies. One charges you for only the channels you want, ala carte. You get BBC America, SyFy, and PBS for $12.

The other cable company makes you buy a package to get these channels, and it cleverly organizes it so no cheaper package includes all three of these. So you get your BBC America, SyFy, and PBS, but you have to buy the Super-Mega Package to get them. You therefore get 120 channels for $120.

Which cable company offers channels for the cheapest price? From your perspective you are getting charged $4 a channel by Company A, and $40 a channel by company B.

But since that information (what you were actuallytrying to order) is recorded nowhere, any public number is more likely going to be a function of the price you paid divided by the channels you bought. In this case Company A is charging you $4 a channel, whereas Company B is charging you $1 a channel. Company B (the grifters) are the cheapest.

What’s the point? Having the “lowest” prices in this case is a symptom of the bundling problem, not an excuse for it. The fact that Elsevier’s prices are in the lowest quartile is most likely a sign of excessive bundling, not of a functional market.

Possibly worth some class time on the cable TV example.