Term of the day: ecological validity.
Ecological validity is a pretty big concern in ed psych, obviously. But I’ve also just read an interesting paper in Health, Risk, and Vulnerability which talks about the ecological validity of psychiatric assessment of criminals being treated for mental illness. The idea there is that many prisoners that do poorly in the highly restrictive environment of the facility (e.g. people with authority issues) are actually quite ready to function in a real world setting, whereas many offenders who do well in these situations (e.g. sex offenders) are actually high recidivism risks. Assessment in the confines of the center, irregardless of the validity of the instrument, is compromised by this.
I hate to draw a link between colleges and psychiatric prisons, but one sometimes wonders about the ecological validity of college itself. Even aside from the tests themselves, you have to ask if the peculiar nature of college — consisting of a social environment unlike any other, really — provides an ideal environment for the assessment of ability as applied to the outside world.
Example of Simpson’s Paradox from The Numbers behind Numb3rs.
In this example„ women are accepted at a higher rate (or roughly equal rate) to all of Berkeley’s programs, but are accepted a a lower rate when those acceptances are combined into university-wide stats. Why? Because women apply to more competitive programs…
Cobb illustrated the distinction by means of a famous example from the long struggle physicians and scientists had in overcoming the powerful tobacco lobby to convince governments and the public that cigarette smoking causes lung cancer. Table 2 shows the mortality rates for three categories of people: nonsmokers, cigarette smokers, and cigar and pipe smokers.
At first glance, the figures in Table 2 seem to indicate that cigarette smoking is not dangerous but pipe and cigar smoking are. However, this is not the case. There is a crucial variable lurking behind the data that the numbers themselves do not indicate: age. The average age of the nonsmokers was 54.9, the average age of the cigarette smokers was 50.5, and the average age of the cigar and pipe smokers was 65.9. Using statistical techniques to make allowance for the age differences, statisticians were able to adjust the figures to produce Table 3.
Now a very different pattern emerges…
From Farrington & Tarling’s Prediction in Criminology, a new term: predictive efficiency. The way to think about it is this — suppose I say that a college education predicts low incidence of being convicted of a violent crime, and at the end of the day I’m right — over the course of a year, 97.5% of our college grads are not convicted.
In the absence of a base rate, that doesn’t really tell us anything. It can be a good predictor in that it does predict at high rates of certainty, but it’s inefficient compared to alternative predictors.
Since the statistics class I teach is supposed to be integrative — that is, to show connections between various disciplines and other aspects of life — I’m always on the lookout for ways to jury-rig an understanding from one domain to understand another. I think I just found a neat example.
But first, look at these two different stories of the Obama record on jobs:
To the average viewer these may seem like incompatible stories. In the top graph, Obama begins to pull us out of the recession on day one of his presidency, slowing job losses and eventually moving us to job gains — digging us out of the hole that Bush 43 got us into.
In the bottom graph, Obama takes office, unemployment skyrockets to a historic level, and even now Obama has not returned us to the point we were at on inauguration day. He still hasn’t cleaned up his mess.
Which brings me to the medical stats terms incidence and prevalence.
Incidence refers to the frequency of development of a new illness in a population in a certain period of time, normally one year. When we say that the incidence of this cancer has increased in past years, we mean that more people have developed this condition year after year, i.e.:, the incidence of thyroid cancer has been rising, with 13,000 new cases diagnosed this year.
Prevalence refers to the current number of people suffering from an illness in a given year. This number includes all those who may have been diagnosed in prior years, as well as in the current year. The incidence of a cancer is 20,000 year with a prevalence of 80,000 means that there are 20,000 new cases diagnosed every year and there are 80,000 people living in the United states with this illness, 60,000 of whom were diagnosed in the past decade and are still living with the disease.
I think you see where I’m going with this. If you apply the terminology of epidemiology, the unemployment rate is a prevalence measure. It’s influenced heavily by how long a person who gets a condition (in this case, the state of being unemployed) stays in that condition. Prevalence is a helpful measure of the social and economic impact of a disease.
The jobs creation numbers, on the other hand, are a measure of incidence, in this case measured month by month, year by year.
Which measure you use is related to what you are trying to figure out, but in general, when attacking diseases at least, it is the incidence rate that is looked at most closely — if you can make progress on the incidence rates the prevalence problem will take care of itself eventually. Meanwhile, prevalence can be unreliable — a deadly disease has less prevalence because it is killing people faster (and taking them off the books) — just as the unemployment rate does not include people who have stopped looking. On the other side of the equation, prevalence can often under-represent positive change which can be dwarfed by a large backlog of cases. If we start to make progress on diabetes, for example, it won’t be easily seen in a prevalence chart until many, many years later.
There are other problems with the charts, absolutely. I’m not calling the game for Obama here — that chart tells lies in some other ways. The chart does not control for population growth — a couple hundred thousand jobs are needed just to account for new people coming into the economy. A lot of the positive looking growth is treading water.
And I may have just massacred econometrics — I don’t know. I am sure they have some of their own terms that deal with these things. But I think these sorts of approaches should be at the heart of an integrative statistics course, encouraging people to try insights from one domain and seeing if they have explanatory power in another.
This is a great example for students of how longitudinal measurement is sometimes used in polling to understand the effect of a specific event. The post-speech numbers alone tell us a bit about Obama’s popularity, but nothing about the speech. With a pre/post on the speech, we can use the post-speech gain to understand the speech’s effect.
Diagnostic vs. Spectral Markers. From Principles of Medical Statistics.
Diagnostic markers are about whether the disease is present, whereas spectral markers deal with severity and stage.
If I have 10 kids in my class and two failed last year and one failed this year, I can say two equivalent things:
- 50% less students failed my course this year
- 10% more of my students passed.
The odd thing is most students refuse when looking at such figures to believe they are equivalent statements. In fact, they are prone to believe that if
- 10% more of my students passed, then
- There were 10% less failures
The key is what I chose for a base to calculate the percentage from. I can choose
- total students: 10% more of my students passed,
- failing students: 50% less students failed
- or passing students: 12.5% more students passed
as the base, and each will give me a different percentage. It’s a stunningly easy sort of manipulation that is used all the time to great effect.
Apparently students aren’t the only ones confused. Here’s a paper making a similar error on infant mortality.
Eventually, the 200 students taking the course in person dwindled to a group of 30. Meanwhile, the course’s popularity exploded online, drawing students from around the world. The experience taught the professor that he could craft a course with the interactive tools of the Web that recreated the intimacy of one-on-one tutoring, he said.
I still believe the major technological paradigm that is going to reshape education is tutoring at scale, and it’s interesting to see that for those that succeed in that realm that’s exactly how the experience feels to them.