Base Rate, Revisited

Reading Dan Kahneman’s Thinking Fast and Slow, and I can tell very early in it’s going to be excellent.

The following Kahneman insight is an old saw of research on statistical intuition by now, but was revolutionary when he and Tversky came up with it in the early 70s. I thought I’d share it for those not familiar with it:

As you consider the next question, please assume that Steve was selected at random from a representative sample:

An individual has been described by a neighbor as follows: “Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.”

Is Steve more likely to be a librarian or a farmer?

The resemblance of Steve’s personality to that of a stereotypical librarian strikes everyone immediately, but equally relevant statistical considerations are almost always ignored. Did it occur to you that there are more than 20 male farmers for each male librarian in the United States? Because there are so many more farmers, it is almost certain that more “meek and tidy” souls will be found on tractors than at library information desks. However, we found that participants in our experiments ignored the relevant statistical facts and relied exclusively on resemblance. We proposed that they used resemblance as a simplifying heuristic (roughly, a rule of thumb) to make a difficult judgment. The reliance on the heuristic caused predictable biases (systematic errors) in their predictions.

This seems a bit of a game when guessing occupations, but of course replace “Steve” with “an unknown medical condition” that resembles X, and the stakes become much more serious. Classic heart disease symptoms at 30 are far less likely to be heart disease than fuzzier, more ambiguous symptoms at age 70. One hopes one one’s doctor knows that — if they do, it’s through training and education, not untutored intuition.


You can read their 1974 paper, “Judgment under Uncertainty: Heuristics and Biases”, where they first introduced these concepts, here.

The book, just out this year and covering Kahneman’s work and the more recent work of others in the field, is here.

Moonwalking with Einstein

Just finished Joshua Foer’s book Moonwalking with Einstein, one of the most amusing books I’ve read in a while. I’d highly recommend it to anyone, just based on the style of his writing alone, which strikes me as Jonah Lehrer as written by Sarah Vowell (of The Wordy Shipmates period, not Assassination Vacation). But that probably doesn’t really capture it either. 

You know what — you just have to read it. 

In any case, a lot of it is about memory tricks and the like, but the thread that runs through it to the end is the power of deliberate practice (he becomes a test subject during his training for Ericsson himself). We tend to not get better at things after achieving base-level competency because we know enough to get by. Unstructured experience, as Ericsson has shown again and again, can be a really poor teacher. Conversely, when we make a commitment to approach our performance more critically and strategically, even small amounts of effort  can pay off big. Working smart at learning something will tend to beat working long.

All good lessons, I think. 

Higher Education is Already a Voucher System

Saw this about the K-12 online space today in NYT:

Some teachers at K12 schools said they felt pressured to pass students who did little work. Teachers have also questioned why some students who did no class work were allowed to remain on school rosters, potentially allowing the company to continue receiving public money for them. State auditors found that the K12-run Colorado Virtual Academy counted about 120 students for state reimbursement whose enrollment could not be verified or who did not meet Colorado residency requirements. Some had never logged in.

“What we’re talking about here is the financialization of public education,” said Alex Molnar, a research professor at the University of Colorado Boulder School of Education who is affiliated with the education policy center. “These folks are fundamentally trying to do to public education what the banks did with home mortgages.”

Sound familiar? These practices are already widespread in higher education’s for-profit space, and to a lesser extent in the non-profit space. 

We’ve been a voucher system for decades, and the corrosion is extensive….



As Rowin points out that badges ‘draw upon widespread use of badges and achievements in gaming‘ and as somebody who has many badges and achievements in various game systems I can’t help but wonder if some of the problems that have cropped up in games might cross over into the Open Badge Initiative.

David goes on to outline some historical problems with badges. I think badges may be useful in a lot of circumstances, but would like to see more posts like this, talking about the problems with them.

(via @josiefraser)

The idea that you can just put stuff out there, and that it will magically be effective and used effectively — there’s just no evidence of that,” Twigg says. Collecting evidence requires integrating with college classrooms, which requires scale and support, which requires money, she says. Lots of it. “Sixteen million dollars is not chump change,” says Twigg, “but you need to be able to support and sustain it.” Nonprofit projects in higher education do not have a great track record on this, she points out, not even the most highly regarded ones.

The idea that you can just put stuff out there, and that it will magically be effective and used effectively — there’s just no evidence of that,” Twigg says. Collecting evidence requires integrating with college classrooms, which requires scale and support, which requires money, she says. Lots of it. “Sixteen million dollars is not chump change,” says Twigg, “but you need to be able to support and sustain it.” Nonprofit projects in higher education do not have a great track record on this, she points out, not even the most highly regarded ones.
Carol Twigg on Khan Academy 

Openness as a Privilege Multiplier

…was the name of the presentation Jim Groom and I originally submitted to ELI 2011. It was going to investigate the tendency of “undirected and unregulated openness to exacerbate inequality both in and out of the classroom” and suggest that this tendency “undermined the social justice claims of openness”. 

Jim’s remedy was going to be corporate regulation (and a rant on corporate skimming of open development), and my remedy was going to be a rethinking of the current approach to OER, which is overly focused on self-study materials — stuff which is great for people that are, frankly, already doing quite well. 

(The proposal was rejected, and we got exiled to the Herman Miller room session we eventually did). 

So I’m happy to find (via Downes) that Justin Reich has been looking into this and adding some data to the mix. And guess what? Openness is a privilege multiplier

(To sum up Reich’s article, the world looks a lot like scenario #2)

This is not a small problem; it’s not a bug we can program out of the system with a single tweak. If openness is a privilege multiplier, we can no longer sprinkle openness magic dust on problems and expect them to go away — more openness means more inequality, more centralized power, and the perpetuation of a permanent underclass. If openness is a privilege multiplier it means that every open project has to design in solutions to that issue on day one, and assess the impact on equality on day one hundred and one.

I think this has particularly profound implications in the area of OER, but I’ll save that post for another day.

A Statistical Literacy Concept Inventory

Been thinking lots about concept inventories. The key to a good concept inventory is that it tests intuitions, not terminology or formulas. It’s far too easy to pre-test students on a test with unfamiliar vocabulary, spend a semester on vocabulary, then act surprised that students do better at the end of the semester when they finally understand the questions. 

A concept inventory should not require (much) access to terminology. The only attempt I’ve seen at a statistical concept inventory fails at this. Here’s a question from the SCI developed at Purdue:

Which of the following could never be considered a population?

  • Four-door cars produced in a factory in Detroit 
  • Football teams in the Big 12 
  • Players on a randomly selected football team 
  • One hundred randomly selected Wal-Mart stores

There’s a concept in there, certainly, but students taking the pre-test are blocked from getting this by the term, so it is unclear if students that demonstrate gains in the post-test have a deeper conceptual understanding, or have merely mastered enough terminology to finally understand the question.

(Better attempts have been made of course. Milo Scheild’s pre/post in his statistical literacy course is mostly free of such problems. I’m sure there are others.)

To truly do a reliable pre/post you have to get past the definitions and the formulas, and into intuitions and conceptual understanding. Here’s my idea of a Concept Inventory-style question:

A recent blog post compared statistics from the “glory days” of rock-and-roll to the music of today. The point of the post was that modern day acts have eclipsed the achievements of more classic acts. However they fail to take into account that the population has grown since the classic acts released their records. Which of the following statements is the only statement that would not be affected by taking potential audience size into account? (Note: each one of these compares an artist from the past decade to artists from the 1990s or earlier):

  • Ke$ha’s Tik-Tok sold more copies than ANY Beatles single
  • Katy Perry holds the same record as Michael Jackson for most number one singles from an album
  • More people bought Celine Dion’s Falling Into You than any Queen, Nirvana, or Bruce Springsteen record
  • Flo-rida’s Low made more money than The Beatles’s Hey Jude

I know this question isn’t perfect (good questions are hard) but it gets much closer to what we want than other questions I’ve seen. Underneath this question is the mechanics of how comparing things by rank helps control for the population difference — but you don’t need terminology around rank or controlling for population to get it.

I’d love to see more of these if other people have them. And if you want to give some comments to firm up the above question, go ahead!

Active Learning Not Associated with Student Learning in a Random Sample of College Biology Courses

Active Learning Not Associated with Student Learning in a Random Sample of College Biology Courses

I’ve been collecting these sorts research examples and making an effort to read them thoroughly, partially because I think we’ve become a bit too self-congratulatory on active learning, and partially because you learn more from these failures than yet another paper confirming active learning/constructivism/engaged pedagogy works.

This one is particularly interesting for a couple of reasons. First of all, it ends up showing that although active learning did not correlate with learning gains, using active learning to confront misconceptions did.

That’s really interesting, because if you look at well-designed physics clicker questions, for example, they really plug into common misconceptions — but it takes time in a discipline to really hone a set of questions like that. 

The study is also interesting because it reminds us of the normal state of affairs, where students are graduating biology with common misbeliefs about evolution:

Thirty-nine percent (n = 13) of courses had an effect size lower than 0.42, which corresponds to students answering only one more question (out of 10) correctly on the posttest than on the pretest.1 When learning was calculated as average normalized gain, the mean gain was 0.26 (SD = 0.17). On the cheetah question, learning gains were even lower. Effect sizes ranged from −0.16–0.58. The mean effect size was 0.15 (SD = 0.19) and the mean normalized gain for the cheetah question was 0.06 (SD = 0.08). These remarkably low learning gains suggest students are not learning to apply evolutionary knowledge to novel questions in introductory biology courses.

That’s 15 weeks or so to get one more answer out of ten on a post-test right. I’m not mocking that at all — in fact, quite the opposite. It’s worth remembering how hard it is to get gains in these areas.  When we see effect sizes of 1 or more, our jaw should be on the floor…

As far as weaknesses of the study — self-reports, self-reports, self-reports. They try to deal with this by doing a correlation with student impressions, but what I’d really like to see is a sample observed on video and coded. 

Juliette Culver is a Freaking Genius

I just decided to give Evernote another try, found my old account active, and spent a couple hours going through my old bookmarks from 2009. 

One was to Juliette Culver’s blog, which I’d made a note to myself was brilliant. It is, but more importantly it’s just even-keeled and unpretentious in its brilliance. It cuts through hype like a hot knife through butter.

You should go read it. There are 20 or so posts on edtech called out as worthwhile on the front page, if you are serious about edtech, you’ll read every one.