How Visa Predicts Divorce

How Visa Predicts Divorce

From TDB

Hunch then looks for statistical correlations between the information that all of its users provide, revealing fascinating links between people’s seemingly unrelated preferences. For instance, Hunch has revealed that people who enjoy dancing are more apt to want to buy a Mac, that people who like The Count onSesame Street tend to support legalizing marijuana, that pug owners are often fans of The Shawshank Redemption, and that users who prefer aisle seats on planes “spend more money on other people than themselves.”

Stuff like this is usually overblown a bit (writers always get a case of Gladwell-itis when talking statistics) but it’s also the future as more and more data about us gets logged in ways that allow for association. 

On thing that occurs to me reading this is that the shift in statistics consumption is likely to mirror the post-internet shift in traditional publishing — strong associations, like books, used to be hard and expensive to churn out, so a lot of filtering went into the process up front — people would run statistics on things they thought might matter for other reasons.

With the advent of total life-logging via credit card, smartphones, and social media and the with rise of large and extensive cohort databases, associations become cheap — but the significance filter is passed on to the consumer.

In other words, if your publication filter is insufficient for the modern world (as Shirky claims) just imagine how inadequate your statistics filter is for the deluge about to come…

Udacity and the future of online universities

Udacity and the future of online universities

Felix Salmon on Sebastian Thrun, the open course runner extraordinaire who built the Stanford AI course:

Thrun was eloquent on the subject of how he realized that he had been running “weeder” classes, designed to be tough and make students fail and make himself, the professor, look good. Going forwards, he said, he wanted to learn from Khan Academy and build courses designed to make as many students as possible succeed — by revisiting classes and tests as many times as necessary until they really master the material.

When the history of open education is written, I think one question will be whether the initial focus on participation from the “top colleges” was a good place to start. Undergraduate education in a place like Stanford can be divorced from the problems of universal education in very unhelpful ways.

If you have a “weeder” mentality, there is no failure. Those people that dropped out? Well, good riddance. The people that studied but didn’t learn? Probably not college material. 

You can’t have a universal access focus and a weeder mentality at the same time. The two are antithetical.

Thrun apparently agrees:

But that’s not the announcement that Thrun gave. Instead, he said, he concluded that “I can’t teach at Stanford again.” He’s given up his tenure at Stanford, and he’s started a new online university called Udacity. He wants to enroll 500,000 students for his first course, on how to build a search engine — and of course it’s all going to be free.

Adding: I’m probably way too harsh on Ivy League schools here — anybody who tries to do something in this space is a friend of mine. But I get frustrated with a press that ignores similar experiments from lower-tier institutions and a grant structure that seeks answers to problems of universal education from the most elite institutions on the planet. 

But those people working at top-tier schools to do this? Still my heroes, every one of you. 

The Numbers Game

We often talk of social statistics, especially those that seem as straightforward as age, as if a bureaucrat were poised with a clipboard, peering through every window, counting; or, better still, had some machine to do it for them. The unsurprising truth is that, for many of the statistics we take for granted, there is no such bureaucrat, no machine, no easy count, we do not all clock in, or out, in order to be recorded, there is no roll call for each of our daily activities, no kindergarten 1, 2, 3.

What there is out there, more often than not, is thick strawberry jam, through which someone with a bad back on a tight schedule has to wade—and then try to tell us how many strawberries are in it.

I’m reading Blastland and Dilmot’s The Numbers Game right now, and it is brilliant so far. I love that it starts with one of the fundamental quantitative reasoning questions: What did you count and how did you count it?  


The book is tangentially related to the long running BBC radio show More or Less, which you can can listen to for free here.

Milo Schield’s short paper Teaching the Social Construction of Statistics deals with “strawberry jam” issues, and is well worth a read.

Researching my health statistics class, and found this great walk-through of the issues of sensitivity and specificity in medical test design and interpretation. Clear, easy to read, and suitable for everyone.

Everyone that gets medical tests done or will get medical tests done (which, let’s face it, is everyone) should be familiar with this stuff, but it’s often hard to visualize. The author’s technique for representing this makes it wonderfully simple. Click through!


If you are interested in these issues, Kaiser Fung’s Numbers Rule Your World has a great discussion of how they relate to the sham that is steroids testing in sports.

The best book on health statistics I have read is Know Your Chances. For some reason there is a free PDF of it here. It’s short, you can read it in an afternoon, and it’s one of the most useful things you could spend your time on, honest.

I’ve been playing around with cognitive disfluency in slide design for my class lately, trying to solve a conundrum.

The problem is this — we know from research that reading materials that introduce “desirable difficulties” (such as presenting information in a difficult to read font) are recalled better than reading materials with a cleaner, more fluent presentation. This has been referred to as the “Comic Sans Effect”, after the notoriously hard to read font that is also apparently one of the more memorable. But the research shows that anything which disturbs fluency can have positive effects on recall — printing pages with a low toner cartridge, or producing deliberately bad photocopies.

(There’s a lot of caveats to this research, which I’ll deal with later — particularly around the issue of whether we are testing “difficulty” or “novelty”, but also it is a relatively new finding and it’s unclear how it transfers to something like slide design…) 

The problem is there’s a natural tension between your need as a presenter to have your slides represent you as a professional, and your desire to introduce desirable difficulties into slide-reading. The slidesets linked below represent my attempt to strike that balance. They are heavily influenced by mid-90s graphic design and perhaps also by Leigh Blackall’s presentation style from five or six years ago (Leigh’s slides in that 2006 Networked Learning presentation seared themselves into my brain forever, a perfect example of this working well). 

Anyway, here’s some attempts by me to do this. Viva disfluency!

Association & Causation

Observable/Unobservable, Inference, and Claims



From A. N. Whitehead’s An Introduction to Mathematics, a brilliant early reflection on what we now see as a System 1/System 2 problem: “It is a profoundly erroneous truism, repeated by all copy-books and by eminent people when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilization advances by extending the number of important operations which we can perform without thinking about them.”