On Sex After Prostate Surgery, Confusing Data [Problems with Term Definition]
A classic problem of term definition from the NYT (somewhat older article):
A notable study in 2005 showed that a year after surgery, 97 percent of patients were able to achieve an erection adequate for intercourse. But last month, researchers from George Washington University and New York University reviewed interim data from their own study showing that fewer than half of the men who had surgery felt their sex lives had returned to normal within a year.
So which of the studies is right? Surprisingly, they both are.
Basically, the first number hinges on whether the patient occasionally acheives an erection “adequate for intercourse”. The article goes on to say that this definition is pretty inadequate from the patient’s viewpoint:
“That definition is misleading,” said Dr. Jason D. Engel, director of the urologic robotic surgery program at the George Washington University Hospital. “It doesn’t mean it was good intercourse, and it doesn’t even mean your penis was hard. That man is going to say, ‘I’m impotent.’ But in the surgeon’s eyes, that man had an erection adequate for intercourse.”
The better question for men is whether they can have sex when they want to, with or without drugs like Viagra. In a recent series of patients, Dr. Engel found that after a year 47 percent of men who had robotic prostatectomy were able to have regular sex.
Although he could cite statistics to give men a more hopeful view, he said that did not help the patient.
What we stress in the statistical literacy course is that such faulty definitions are not wrong — just ill-suited to the questions they are trying to answer. A man who asked this question before surgery is likely asking a question for which definition #1 is clearly unsuited.
If your answer was that war is the wrong metaphor, you win the prize, I suppose. Still, I found this exercise from a medical stats textbook rather interesting:
17.1. A major controversy has occurred about apparent contradictions in biostatistical data as researchers try to convince Congress to allocate more funds for intramural and extramural investigations supported by the NIH. Citing improved survival rates for conditions such as cervical cancer, breast cancer, and leukemia, clinicians claim we are “winning the war” against cancer. Citing increased incidence rates for these (and other cancers), with minimal change in mortality rates, public-health experts claim that the “war” has made little progress, and we should focus on prevention rather than cure.
17.1.1. What explanation would you offer to suggest that the rising incidence of cancer is a statistical consequence of “winning” rather than “losing” the battle?
17.1.2. What explanation would you offer to reconcile the contradictory trends for survival and mortality rates, and to suggest that both sets of results are correct?
The pessimistic explanation is that early detection detects more benign cancers, leading to inflated five-year survival stats. But the question is looking for the optimistic answer, which seems harder to grok.
Off the top of my head, one thing that comes to mind is that people can get multiple types of cancer. If you survive thyroid cancer at 60, you might die of colon cancer at 70. Since we all die of something, greater survival rates would also lead to greater incidence, but wouldn’t necessarily reduce mortality. In a world of miracle cancer cures, for instance, you might start racking up cancer later in life the way people rack up colds in their youth.
Of course, I don’t have a teacher’s edition here, so any input would be welcome — alternate theories?
How Visa Predicts Divorce
Hunch then looks for statistical correlations between the information that all of its users provide, revealing fascinating links between people’s seemingly unrelated preferences. For instance, Hunch has revealed that people who enjoy dancing are more apt to want to buy a Mac, that people who like The Count onSesame Street tend to support legalizing marijuana, that pug owners are often fans of The Shawshank Redemption, and that users who prefer aisle seats on planes “spend more money on other people than themselves.”
Stuff like this is usually overblown a bit (writers always get a case of Gladwell-itis when talking statistics) but it’s also the future as more and more data about us gets logged in ways that allow for association.
On thing that occurs to me reading this is that the shift in statistics consumption is likely to mirror the post-internet shift in traditional publishing — strong associations, like books, used to be hard and expensive to churn out, so a lot of filtering went into the process up front — people would run statistics on things they thought might matter for other reasons.
With the advent of total life-logging via credit card, smartphones, and social media and the with rise of large and extensive cohort databases, associations become cheap — but the significance filter is passed on to the consumer.
In other words, if your publication filter is insufficient for the modern world (as Shirky claims) just imagine how inadequate your statistics filter is for the deluge about to come…
We often talk of social statistics, especially those that seem as straightforward as age, as if a bureaucrat were poised with a clipboard, peering through every window, counting; or, better still, had some machine to do it for them. The unsurprising truth is that, for many of the statistics we take for granted, there is no such bureaucrat, no machine, no easy count, we do not all clock in, or out, in order to be recorded, there is no roll call for each of our daily activities, no kindergarten 1, 2, 3.
What there is out there, more often than not, is thick strawberry jam, through which someone with a bad back on a tight schedule has to wade—and then try to tell us how many strawberries are in it.
I’m reading Blastland and Dilmot’s The Numbers Game right now, and it is brilliant so far. I love that it starts with one of the fundamental quantitative reasoning questions: What did you count and how did you count it?
The book is tangentially related to the long running BBC radio show More or Less, which you can can listen to for free here.
Milo Schield’s short paper Teaching the Social Construction of Statistics deals with “strawberry jam” issues, and is well worth a read.
Researching my health statistics class, and found this great walk-through of the issues of sensitivity and specificity in medical test design and interpretation. Clear, easy to read, and suitable for everyone.
Everyone that gets medical tests done or will get medical tests done (which, let’s face it, is everyone) should be familiar with this stuff, but it’s often hard to visualize. The author’s technique for representing this makes it wonderfully simple. Click through!
If you are interested in these issues, Kaiser Fung’s Numbers Rule Your World has a great discussion of how they relate to the sham that is steroids testing in sports.
The best book on health statistics I have read is Know Your Chances. For some reason there is a free PDF of it here. It’s short, you can read it in an afternoon, and it’s one of the most useful things you could spend your time on, honest.
I’ve been playing around with cognitive disfluency in slide design for my class lately, trying to solve a conundrum.
The problem is this — we know from research that reading materials that introduce “desirable difficulties” (such as presenting information in a difficult to read font) are recalled better than reading materials with a cleaner, more fluent presentation. This has been referred to as the “Comic Sans Effect”, after the notoriously hard to read font that is also apparently one of the more memorable. But the research shows that anything which disturbs fluency can have positive effects on recall — printing pages with a low toner cartridge, or producing deliberately bad photocopies.
(There’s a lot of caveats to this research, which I’ll deal with later — particularly around the issue of whether we are testing “difficulty” or “novelty”, but also it is a relatively new finding and it’s unclear how it transfers to something like slide design…)
The problem is there’s a natural tension between your need as a presenter to have your slides represent you as a professional, and your desire to introduce desirable difficulties into slide-reading. The slidesets linked below represent my attempt to strike that balance. They are heavily influenced by mid-90s graphic design and perhaps also by Leigh Blackall’s presentation style from five or six years ago (Leigh’s slides in that 2006 Networked Learning presentation seared themselves into my brain forever, a perfect example of this working well).
Anyway, here’s some attempts by me to do this. Viva disfluency!
Association & Causation
Observable/Unobservable, Inference, and Claims
From A. N. Whitehead’s An Introduction to Mathematics, a brilliant early reflection on what we now see as a System 1/System 2 problem: “It is a profoundly erroneous truism, repeated by all copy-books and by eminent people when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilization advances by extending the number of important operations which we can perform without thinking about them.”
Check it out here:
I’m not sure how you trust a company who claims to have some super-secret statistical insight when they put out things like this.
Reading Dan Kahneman’s Thinking Fast and Slow, and I can tell very early in it’s going to be excellent.
The following Kahneman insight is an old saw of research on statistical intuition by now, but was revolutionary when he and Tversky came up with it in the early 70s. I thought I’d share it for those not familiar with it:
As you consider the next question, please assume that Steve was selected at random from a representative sample:
An individual has been described by a neighbor as follows: “Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.”
Is Steve more likely to be a librarian or a farmer?
The resemblance of Steve’s personality to that of a stereotypical librarian strikes everyone immediately, but equally relevant statistical considerations are almost always ignored. Did it occur to you that there are more than 20 male farmers for each male librarian in the United States? Because there are so many more farmers, it is almost certain that more “meek and tidy” souls will be found on tractors than at library information desks. However, we found that participants in our experiments ignored the relevant statistical facts and relied exclusively on resemblance. We proposed that they used resemblance as a simplifying heuristic (roughly, a rule of thumb) to make a difficult judgment. The reliance on the heuristic caused predictable biases (systematic errors) in their predictions.
This seems a bit of a game when guessing occupations, but of course replace “Steve” with “an unknown medical condition” that resembles X, and the stakes become much more serious. Classic heart disease symptoms at 30 are far less likely to be heart disease than fuzzier, more ambiguous symptoms at age 70. One hopes one one’s doctor knows that — if they do, it’s through training and education, not untutored intuition.
You can read their 1974 paper, “Judgment under Uncertainty: Heuristics and Biases”, where they first introduced these concepts, here.
The book, just out this year and covering Kahneman’s work and the more recent work of others in the field, is here.
Been thinking lots about concept inventories. The key to a good concept inventory is that it tests intuitions, not terminology or formulas. It’s far too easy to pre-test students on a test with unfamiliar vocabulary, spend a semester on vocabulary, then act surprised that students do better at the end of the semester when they finally understand the questions.
A concept inventory should not require (much) access to terminology. The only attempt I’ve seen at a statistical concept inventory fails at this. Here’s a question from the SCI developed at Purdue:
Which of the following could never be considered a population?
- Four-door cars produced in a factory in Detroit
- Football teams in the Big 12
- Players on a randomly selected football team
- One hundred randomly selected Wal-Mart stores
There’s a concept in there, certainly, but students taking the pre-test are blocked from getting this by the term, so it is unclear if students that demonstrate gains in the post-test have a deeper conceptual understanding, or have merely mastered enough terminology to finally understand the question.
(Better attempts have been made of course. Milo Scheild’s pre/post in his statistical literacy course is mostly free of such problems. I’m sure there are others.)
To truly do a reliable pre/post you have to get past the definitions and the formulas, and into intuitions and conceptual understanding. Here’s my idea of a Concept Inventory-style question:
A recent blog post compared statistics from the “glory days” of rock-and-roll to the music of today. The point of the post was that modern day acts have eclipsed the achievements of more classic acts. However they fail to take into account that the population has grown since the classic acts released their records. Which of the following statements is the only statement that would not be affected by taking potential audience size into account? (Note: each one of these compares an artist from the past decade to artists from the 1990s or earlier):
- Ke$ha’s Tik-Tok sold more copies than ANY Beatles single
- Katy Perry holds the same record as Michael Jackson for most number one singles from an album
- More people bought Celine Dion’s Falling Into You than any Queen, Nirvana, or Bruce Springsteen record
- Flo-rida’s Low made more money than The Beatles’s Hey Jude
I know this question isn’t perfect (good questions are hard) but it gets much closer to what we want than other questions I’ve seen. Underneath this question is the mechanics of how comparing things by rank helps control for the population difference — but you don’t need terminology around rank or controlling for population to get it.
I’d love to see more of these if other people have them. And if you want to give some comments to firm up the above question, go ahead!