(Educational) Research Isn’t Broken But the Culture Is

Great post today out on Simply Statistics. In it, the author critiques the claim that most research is false, finding claims of a reproducibility crisis are probably overstated at this point, but concluding the following steps are still necessary:

We need more statistical literacy
We need more computational literacy
We need to require code be published
We need mechanisms of peer review that deal with code
We need a culture that doesn’t use reproducibility as a weapon
We need increased transparency in review and evaluation of papers

I’d agree with this. I find the points about culture particularly important. One of the sad things about about the Course Signals situation was the reaction showed how incapable the current system is of having a debate about *numbers*, with everyone immediately retreating to narrative (or retreating to silence).

If you’re not willing to show your work, you’re not in research. But if your analysis of statistical or computational error is “Ha, ha, you’re wrong!” you’re not in research either. Both tendencies are toxic, and both issues play off one another in a culture that increasingly awards people no benefits for openness but exposes them to a lot of professional risk. The smart move for any researcher today — in climate science, education, or anything of social import — is to make it as difficult as possible to dig into the work, process, and numbers behind their results. (Don’t believe me? Just ask Michael Mann.)

If we believe educational research does matter (and it does), we need to lose the combative attitude about results that don’t support our views, and we need to open up results that do to criticism. That requires a culture that takes a joy in geeking out about the numbers before jumping into tried and true narratives; we seem to be getting further from that every day.

Counting History PhD Employment

I used to do more statistical literacy stuff on this blog, and I’m toying with the idea of going back to that. The problem is that the stuff that really tends to matter is stuff everybody thinks they already know, but which most people have not built habits around. It’s not really fascinating stuff to talk about, and most of the time it doesn’t result in huge discoveries, but rather, small modifications to our understanding of claims.

A good example of this is recent history PhD study, which shows surprisingly high employment of history PhD’s. It’s a great study, and hugely useful. However, the summary contains this line, which I’m sure people will latch onto:

The overall employment rate for history PhDs was exceptionally high: only two people in the sample (of 2,500) appeared unemployed and none of them occupied the positions that often serve as punch lines for jokes about humanities PhDs—as baristas or short order cooks. (italics mine)

In the COMPARABLE framework I used to give my students, one of the first questions you ask is “How was this number computed?” (“O” stands for “How were the variables Operationalized?”). A quick two minute scan of the article shows us this:

To identify the career paths of recent history PhDs, the AHA hired Maren Wood (Lilli Research Group) to track down the current employment of a random sample of 2,500 PhDs culled from a total of 10,976 history dissertations reported to the AHA’s Directory of History Departments and Historical Organizations from May 1998 through August 2009. The AHA’s Directory Editor, Liz Townsend, compared the data to employment information in the AHA Directory—which lists academic faculty—and the Association’s membership lists, and Wood used publicly available information on the Internet. Data was collected during February and March of 2013, and reviewed in June and July. Together, AHA staff and Maren Wood identified current employment or status information, as of spring 2013, on all but 70 members of the sample group.

A lot of time when you can’t determine the status of part of your sample, you can assume that the unreachable, unfindable people break down more or less into the same percentages as the reachable part of your sample. But how you collect data affects this. In this case, the existence of the American History Association directory makes it highly unlikely that there were unfound tenure-track positions, and the public nature of university directories probably sussed out most other people in university positions.

On the other side of things, we can imagine that the most invisible, hard-to-find people would be the ones that are unemployed or work low-paying, low-profile, non-academic jobs.

All in all, I think it likely that tracking down the untrackable would substantially add to the unemployed count, and might even dig up a barista. The research methodology almost guarantees that the 3% of people not found will be primarily people outside the university system.

So I think this “two people unemployed” business is overstated. Still. the claim that half of history PhDs are employed in four-year tenure track stands despite this, and that remains a rather interesting result.

With that result, there’s perhaps another issue. The initial sample is culled from finished dissertations. But dissertations are often abandoned, and all-but-dissertation (ABD) tends to become a permanent state for many that don’t find employment in academia. Why finish the dissertation if you can’t find a job in your field? Barista jokes are unfair, but if there is a PhD barista, they are likely ABD, and they wouldn’t show up in these stats anyway.

What would the stats look like if we included the ABD students? A minor quibble, unlikely to have a *huge* impact on the numbers. But it moves possibly sensational claims a bit closer to reality, especially in the humanities, where 10 year degree completion is sub-50%, IIRC.

A final thing I might note as rather odd is the small number of the PhDs working the community college system. In the “M” part of the COMPARABLE framework, students are asked to create a basic “model” in their head, and make predictions — if X is true, what else is likely to be true? Can you check it? Here, the fact that a large number of history teaching jobs are at community college, but only a 5.5% of our PhD sample work these jobs (compared to 50% of faculty working tenure track jobs) elicits a guess from us that the vast majority of people teaching history at the community college level must not have PHDs. There are certainly ways where that could be false and the data is still good, but if that prediction turns out wrong, then we’d have to dig deeper into the data.

So there you go. A partial analysis.

Now here’s the question for readers — is this boring as hell? Interesting? Boring, but salvagable?

The thing is I really believe in this stuff — getting into these habits of mind that let you do a five minute analysis of numbers. And the way I’ve learned it is by watching people model it (Tim Harford, Ben Goldacre, Milo Scheild, Joel Best, etc). But I think it can be a bit boring to read unless there is some big revelation, and most of the time the revelation is that the numbers are worthwhile, but likely somewhat overstated. Hardly edge of chair stuff.

Thoughts on how to blog this sort of thing? I was thinking of doing one a week if I could find some way to make it interesting.