Ecological Validity

Term of the day: ecological validity.

Ecological validity is a pretty big concern in ed psych, obviously. But I’ve also just read an interesting paper in Health, Risk, and Vulnerability which talks about the ecological validity of psychiatric assessment of criminals being treated for mental illness. The idea there is that many prisoners that do poorly in the highly restrictive environment of the facility (e.g. people with authority issues) are actually quite ready to function in a real world setting, whereas many offenders who do well in these situations (e.g. sex offenders) are actually high recidivism risks. Assessment in the confines of the center, irregardless of the validity of the instrument, is compromised by this. 

I hate to draw a link between colleges and psychiatric prisons, but one sometimes wonders about the ecological validity of college itself. Even aside from the tests themselves, you have to ask if the peculiar nature of college — consisting of a social environment unlike any other, really — provides an ideal environment for the assessment of ability as applied to the outside world.


Simpson’s Paradox

Example of Simpson’s Paradox from The Numbers behind Numb3rs.

In this example„ women are accepted at a higher rate (or roughly equal rate) to all of Berkeley’s programs, but are accepted a a lower rate when those acceptances are combined into university-wide stats. Why? Because women apply to more competitive programs…


A good example of age as confounder

From The Numbers behind Numb3rs:

Cobb illustrated the distinction by means of a famous example from the long struggle physicians and scientists had in overcoming the powerful tobacco lobby to convince governments and the public that cigarette smoking causes lung cancer. Table 2 shows the mortality rates for three categories of people: nonsmokers, cigarette smokers, and cigar and pipe smokers.

At first glance, the figures in Table 2 seem to indicate that cigarette smoking is not dangerous but pipe and cigar smoking are. However, this is not the case. There is a crucial variable lurking behind the data that the numbers themselves do not indicate: age. The average age of the nonsmokers was 54.9, the average age of the cigarette smokers was 50.5, and the average age of the cigar and pipe smokers was 65.9. Using statistical techniques to make allowance for the age differences, statisticians were able to adjust the figures to produce Table 3.

Now a very different pattern emerges…


Predictive Efficiency

From Farrington & Tarling’s Prediction in Criminology, a new term: predictive efficiency. The way to think about it is this — suppose I say that a college education predicts low incidence of being convicted of a violent crime, and at the end of the day I’m right — over the course of a year, 97.5% of our college grads are not convicted.

In the absence of a base rate, that doesn’t really tell us anything. It can be a good predictor in that it does predict at high rates of certainty, but it’s inefficient compared to alternative predictors.


Incidence, Prevalence, and the Obama Job Record

Since the statistics class I teach is supposed to be integrative — that is, to show connections between various disciplines and other aspects of life — I’m always on the lookout for ways to jury-rig an understanding from one domain to understand another. I think I just found a neat example.

But first, look at these two different stories of the Obama record on jobs:

To the average viewer these may seem like incompatible stories. In the top graph, Obama begins to pull us out of the recession on day one of his presidency, slowing job losses and eventually moving us to job gains — digging us out of the hole that Bush 43 got us into. 

In the bottom graph, Obama takes office, unemployment skyrockets to a historic level, and even now Obama has not returned us to the point we were at on inauguration day. He still hasn’t cleaned up his mess.

Which brings me to the medical stats terms incidence and prevalence.

From  http://tirgan.com/incidence.htm :

Incidence refers to the frequency of development of a new illness in a population in a certain period of time, normally one year. When we say that the incidence of this cancer has increased in past years, we mean that more people have developed this condition year after year, i.e.:, the incidence of thyroid cancer has been rising, with 13,000 new cases diagnosed this year.

Prevalence refers to the current number of people suffering from an illness in a given year. This number includes all those who may have been diagnosed in prior years, as well as in the current year. The incidence of a cancer is 20,000 year with a prevalence of 80,000 means that there are 20,000 new cases diagnosed every year and there are 80,000 people living in the United states with this illness, 60,000 of whom were diagnosed in the past decade and are still living with the disease. 

I think you see where I’m going with this. If you apply the terminology of epidemiology, the unemployment rate is a prevalence measure. It’s influenced heavily by how long a person who gets a condition (in this case, the state of being unemployed) stays in that condition. Prevalence is a helpful measure of the social and economic impact of a disease. 

The jobs creation numbers, on the other hand, are a measure of incidence, in this case measured month by month, year by year. 

Which measure you use is related to what you are trying to figure out, but in general, when attacking diseases at least, it is the incidence rate that is looked at most closely — if you can make progress on the incidence rates the prevalence problem will take care of itself eventually. Meanwhile, prevalence can be unreliable — a deadly disease has less prevalence because it is killing people faster (and taking them off the books) — just as the unemployment rate does not include people who have stopped looking. On the other side of the equation, prevalence can often under-represent positive change which can be dwarfed by a large backlog of cases. If we start to make progress on diabetes, for example, it won’t be easily seen in a prevalence chart until many, many years later.

There are other problems with the charts, absolutely. I’m not calling the game for Obama here — that chart tells lies in some other ways. The chart does not control for population growth — a couple hundred thousand jobs are needed just to account for new people coming into the economy. A lot of the positive looking growth is treading water.

And I may have just massacred econometrics — I don’t know. I am sure they have some of their own terms that deal with these things. But I think these sorts of approaches should be at the heart of an integrative statistics course, encouraging people to try insights from one domain and seeing if they have explanatory power in another. 


From Swing Voters via ilovecharts

This is a great example for students of how longitudinal measurement is sometimes used in polling to understand the effect of a specific event. The post-speech numbers alone tell us a bit about Obama’s popularity, but nothing about the speech. With a pre/post on the speech, we can use the post-speech gain to understand the speech’s effect.



Diagnostic vs. Spectral Markers. From Principles of Medical Statistics.

Diagnostic markers are about whether the disease is present, whereas spectral markers deal with severity and stage.


‘Adrift’ in Adulthood: Students Who Struggled in College Find Life Harsher After Graduation

‘Adrift’ in Adulthood: Students Who Struggled in College Find Life Harsher After Graduation

From the article:

Here is what they found: Graduates who scored in the bottom 20 percent on a test of critical thinking fared far more poorly on measures of employment and lifestyle when compared with those who scored in the top 20 percent. The test was the Collegiate Learning Assessment, or CLA, which was developed by the Council for Aid to Education.

The students scoring in the bottom quintile were three times more likely than those in the top quintile to be unemployed (9.6 percent compared with 3.1 percent), twice as likely to be living at home with parents (35 percent compared with 18 percent), and significantly more likely to have amassed credit-card debt (51 percent compared with 37 percent).

“That’s a dramatic, stunning finding,” said Mr. Arum, referring to the sharp difference in unemployment so early in the students’ lives after college. “What it suggests is that the general higher-order skills that the Council for Aid to Education assessment is tracking is something of significance, something real and meaningful.”

I’m really curious about this, but initially it raises more questions for me than it answers. Most of the effects seem consequences of not finding a first job (debt, living with parents), and it is hard to see how much raw critical thinking would figure into that (securing the job vs. keeping it). 

It makes me wonder if performance on the test might be a proxy for persistence or responsibility, or any of another ten qualities that will help people realize intellectual gains in college and help people in a job search as well. 

As noted in the article, the selectivity of their college also played a role. Selective colleges take smarter students, and a person from Dartmouth may beat out a person from Podunk State. But that has nothing to do with the skills, per se, but with the degree. They note this in the last paragraph, but they don’t tell us what those quintile comparisons look like when selectivity is controlled for. My guess is that they are a lot less dramatic.

Finally, depending on the correlation with the written part of the CLA, we may also just be seeing that people with writing and communication skills a) get hired over other candidates, and b) do better on written tests.

But hopefully more data coming soon.


Infant mortality and choice of a base

If I have 10 kids in my class and two failed last year and one failed this year, I can say two equivalent things:

  • 50% less students failed my course this year
  • 10% more of my students passed.

The odd thing is most students refuse when looking at such figures to believe they are equivalent statements. In fact, they are prone to believe that if

  • 10% more of my students passed, then
  • There were 10% less failures

The key is what I chose for a base to calculate the percentage from. I can choose

  • total students: 10% more of my students passed,
  • failing students: 50% less students failed
  • or passing students: 12.5% more students passed

as the base, and each will give me a different percentage. It’s a stunningly easy sort of manipulation that is used all the time to great effect.

Apparently students aren’t the only ones confused. Here’s a paper making a similar error on infant mortality.


Tutoring at Scale Sighting

From The Chronicle, Tenured Professor Departs Stanford U., Hoping to Teach 500,000 Students at Online Start-Up:

Eventually, the 200 students taking the course in person dwindled to a group of 30. Meanwhile, the course’s popularity exploded online, drawing students from around the world. The experience taught the professor that he could craft a course with the interactive tools of the Web that recreated the intimacy of one-on-one tutoring, he said.

I still believe the major technological paradigm that is going to reshape education is tutoring at scale, and it’s interesting to see that for those that succeed in that realm that’s exactly how the experience feels to them.


Follow

Get every new post delivered to your Inbox.

Join 129 other followers