I want to do this in a class….

What a neat way of combining two textbooks to get a novel course design (which meshes with current theories of interleaving):

In an effort to maximize spacing and encoding variability, Robert Bjork once taught an honors introductory psychology course twice in one term. Up to the point of the midterm, the basic concepts of introductory psychology were covered using a textbook that adopted a history of psychology approach and emphasized the contributions of key individuals in the history of psychology, such as Pavlov, Freud, and Skinner. After the midterm exam, the basic concepts were covered again, this time using a textbook that adopted a brain mechanisms approach. The goal was to have key concepts come up in each half of the course (spacing) and from a different standpoint (variation).

From here.

Divided Attention During Lecture

I’ve been having some fun reading Bjork and his followers on elements of instruction. It’s good stuff! This comes from Successful Lecturing: Presenting Information in Ways That Engage Effective Processing by  Patricia Ann de Winstanley & Robert A. Bjork:

In addition to its having a strong negative impact on encoding, divided attention has been shown to have much larger effects on direct, or explicit, tests of memory than on indirect, or implicit, tests of memory (MacDonald and MacLeod, 1998; Szymanski and MacLeod, 1996). The implication is that divided attention during a lecture may leave students with a subsequent sense of familiarity, or feeling of knowing, or perceptual facilitation for the presented material but without the concomitant ability to recall or recognize the material on a direct test of memory, such as an examination. As a consequence, students may misjudge the amount of time needed for further study.

Dividing students’ attention during a lecture therefore poses a double threat. First, information is learned less well when attention is divided. Second, one’s feeling of knowing or processing facility remains unaffected by divided attention, which may result in the assumption that information is learned well enough and no further study time is needed (see Bjork, 1999, and Jacoby, Bjork, and Kelley, 1994, for reviews of the literature on illusions of comprehension and remembering).

Concept Inventories and Dan Meyer’s Linear Modeling Exercise

I’ve talked a bit in the past about good concept inventory questions — questions that address difficult conceptual questions but have black and white answers and don’t require any special vocabulary to answer.

Dan Meyer’s Linear Modeling exercise [PDF] is a good example. The first question has a specific answer, and answering it requires the right set of intuitions about linear processes, but it doesn’t matter what terms you are using, and the student does not need to intuit what you are trying to assess to get it right. 

I’ll add the exercise has one other mark of a great inventory question — apart from the title, it contains no hints that this is an application of linear modeling. This jives with what we know from processes like interleaving — that the decision of which model to apply is as important as the model itself. 

One final thing — I can’t help to notice that like many ConceptTests and like many questions on the FCI it is a prediction question. There’s something very powerful about prediction in the way it focuses the mind. More on that later.

Comparing Electoral Behavior

From the Utne Reader, in an article showing that we ” are segregating [our]selves politically and geographically” in the U.S. :

In 1992, 38 percent of Americans lived in counties decided by landslide elections; by 2004, that figure was 48 percent.”

One thing that jumps out at me immediately is that elections are very hard to compare to one another. In this case, 1992 represented the unseating of an incumbent in a three-way race (remember Perot?) — whereas 2004 was the two-way reaffirmation of an incumbent president with no real third party presence. 

How might this affect things? Well, let’s say we define a landslide as getting 60% or more of the vote. In a three way race (like 1992), that would be difficult. Clinton won the election with 43% of the vote (to Bush’s 38, and Perot’s 19). Assuming some deviation off of that average from county to county, you are still unlikely to get a 60% landslide — even a county 50% more Democratic than the average county is still barely breaking the landslide barrier (0.43 * 1.5 = 0.64).

In a two-way race the dynamics are different. In the Kerry/Bush 50/50 split, a candidate that wins in a 50% more Democratic county is going to win by 75%.

The second problem (which we ignored in the above calculation) is that counties are not a good unit of measurement. The majority of counties are small, Republican entities, even though voters are roughly split nationwide (Democrats live in more populous counties). I imagine too, that because the majority of counties are Republican that Republican wins will look polarizing (look at all those deep red counties!) whereas Democratic wins will look less polarizing (the counties that go dark blue will be fewer, but more populous, whereas the Dem votes will eat into the red counties). 

In any case, why not compare something more comparable — like 1984 and 2004? That has problems, but a whole lot less I think. Or compare mid-terms, where the national politics is less confounding. 

Fireside Tutorials and Punk Economics

What do we call this genre of videos, these informal explanations by Khan Academy, RSA:Animate, Common Craft, Vi Hart, and others — these sit across the desk from you and talk things through? I have no idea. But I’m fascinated with the form, and how rethinking video this way makes a lecture seem more like tutoring — even when (as in the case of RSA:Animate) the material is often an adapted lecture.

Anyway, this is my most recent find — Punk Economics, by David McWilliams:

If you have other great examples, shoot me an email or post in the comments.

Hill’s nine criteria for causal association

Sir Austin Bradford Hill’s classic article on the characteristics of a causal relationship is well worth a read, and is still one of the most concise lists of what to look for in any research you read. Here’s a summary of what helps us make the leap from association to causation:

  1. Strength (is the risk is large)
  2. Consistency (the results have been replicated, by different researchers in different situations)
  3. Specificity (the predictor is not related to a broad array of outcomes)
  4. Temporality (predictor always precedes outcome)
  5. Biological gradient (also known as a dose-response: the more predictor involved, the more the outcome is involved)
  6. Plausibility (there is a plausible mechanism — we have a credible theory of how the causal relationship might work)
  7. Coherence (the association is consistent with the history of the disease)
  8. Experimental evidence (experimental interventions show results consistent with the association)
  9. Analogy (there are  similar results that we can draw a relationship to)

It’s worth noting that, as Fung points out in Numbers Rule Your World, there’s an awful lot of situations where we don’t need causality. You can work with strong association in places where you only need to predict (insurance rates, at-risk determinations), and rely on causality only when you have to determine effective interventions. 

The biggest problem I find with students and causality is not that they over-assign causality to situations, but that they see causality as a binary concept. In the minds of many students, there are two buckets — “caused” and “not-caused”. The idea that one association is more likely to be causal than another, that it is probably more likely that diets high in animal fat increase heart disease risk than it is that coffee cures Alzheimer’s, but that neither of these are proved beyond a doubt sort of escapes them — causality is seen as a finish line that is crossed, usually once and for all. 

Problems of Definition: Elsevier’s Prices

The recent boycott of Elsevier provides us with a great quote for use in a statistical literacy class. People are boycotting for a number of reasons, particularly because of the high cost of the “bundles” Elsevier sells.

Claiming that their journals are some of the cheapest in the industry, an Elsevier rep states:

“Over the past 10 years, our prices have been in the lowest quartile in the publishing industry,” said Alicia Wise, Elsevier’s director of universal access. “Last year our prices were lower than our competitors’. I’m not sure why we are the focus of this boycott, but I’m very concerned about one dissatisfied scientist, and I’m concerned about 2,000.”

Form the perspective of definition of terms, this may initially seem pretty straightforward, but it’s anything but. What does “our prices” mean?

  • Mean or median price computed by total offerings? In which case Elsevier could offer hundreds of free and worthless journals that no one uses or orders individually. This would pretty handily offset higher priced offerings.
  • Mean or median price computed by individual sales? This would be a good measure — because it only counts the journals people use, and doesn’t count the junk they carry. But it is impossible to compute this number this way because of their practice of bundling.

This last point is pretty important. Imagine you have two cable companies. One charges you for only the channels you want, ala carte. You get BBC America, SyFy, and PBS for $12.

The other cable company makes you buy a package to get these channels, and it cleverly organizes it so no cheaper package includes all three of these. So you get your BBC America, SyFy, and PBS, but you have to buy the Super-Mega Package to get them. You therefore get 120 channels for $120.

Which cable company offers channels for the cheapest price? From your perspective you are getting charged $4 a channel by Company A, and $40 a channel by company B.

But since that information (what you were actuallytrying to order) is recorded nowhere, any public number is more likely going to be a function of the price you paid divided by the channels you bought. In this case Company A is charging you $4 a channel, whereas Company B is charging you $1 a channel. Company B (the grifters) are the cheapest.

What’s the point? Having the “lowest” prices in this case is a symptom of the bundling problem, not an excuse for it. The fact that Elsevier’s prices are in the lowest quartile is most likely a sign of excessive bundling, not of a functional market.

Possibly worth some class time on the cable TV example.

Ecological Validity

Term of the day: ecological validity.

Ecological validity is a pretty big concern in ed psych, obviously. But I’ve also just read an interesting paper in Health, Risk, and Vulnerability which talks about the ecological validity of psychiatric assessment of criminals being treated for mental illness. The idea there is that many prisoners that do poorly in the highly restrictive environment of the facility (e.g. people with authority issues) are actually quite ready to function in a real world setting, whereas many offenders who do well in these situations (e.g. sex offenders) are actually high recidivism risks. Assessment in the confines of the center, irregardless of the validity of the instrument, is compromised by this. 

I hate to draw a link between colleges and psychiatric prisons, but one sometimes wonders about the ecological validity of college itself. Even aside from the tests themselves, you have to ask if the peculiar nature of college — consisting of a social environment unlike any other, really — provides an ideal environment for the assessment of ability as applied to the outside world.

A good example of age as confounder

From The Numbers behind Numb3rs:

Cobb illustrated the distinction by means of a famous example from the long struggle physicians and scientists had in overcoming the powerful tobacco lobby to convince governments and the public that cigarette smoking causes lung cancer. Table 2 shows the mortality rates for three categories of people: nonsmokers, cigarette smokers, and cigar and pipe smokers.

At first glance, the figures in Table 2 seem to indicate that cigarette smoking is not dangerous but pipe and cigar smoking are. However, this is not the case. There is a crucial variable lurking behind the data that the numbers themselves do not indicate: age. The average age of the nonsmokers was 54.9, the average age of the cigarette smokers was 50.5, and the average age of the cigar and pipe smokers was 65.9. Using statistical techniques to make allowance for the age differences, statisticians were able to adjust the figures to produce Table 3.

Now a very different pattern emerges…

Incidence, Prevalence, and the Obama Job Record

Since the statistics class I teach is supposed to be integrative — that is, to show connections between various disciplines and other aspects of life — I’m always on the lookout for ways to jury-rig an understanding from one domain to understand another. I think I just found a neat example.

But first, look at these two different stories of the Obama record on jobs:

To the average viewer these may seem like incompatible stories. In the top graph, Obama begins to pull us out of the recession on day one of his presidency, slowing job losses and eventually moving us to job gains — digging us out of the hole that Bush 43 got us into. 

In the bottom graph, Obama takes office, unemployment skyrockets to a historic level, and even now Obama has not returned us to the point we were at on inauguration day. He still hasn’t cleaned up his mess.

Which brings me to the medical stats terms incidence and prevalence.

From  http://tirgan.com/incidence.htm :

Incidence refers to the frequency of development of a new illness in a population in a certain period of time, normally one year. When we say that the incidence of this cancer has increased in past years, we mean that more people have developed this condition year after year, i.e.:, the incidence of thyroid cancer has been rising, with 13,000 new cases diagnosed this year.

Prevalence refers to the current number of people suffering from an illness in a given year. This number includes all those who may have been diagnosed in prior years, as well as in the current year. The incidence of a cancer is 20,000 year with a prevalence of 80,000 means that there are 20,000 new cases diagnosed every year and there are 80,000 people living in the United states with this illness, 60,000 of whom were diagnosed in the past decade and are still living with the disease. 

I think you see where I’m going with this. If you apply the terminology of epidemiology, the unemployment rate is a prevalence measure. It’s influenced heavily by how long a person who gets a condition (in this case, the state of being unemployed) stays in that condition. Prevalence is a helpful measure of the social and economic impact of a disease. 

The jobs creation numbers, on the other hand, are a measure of incidence, in this case measured month by month, year by year. 

Which measure you use is related to what you are trying to figure out, but in general, when attacking diseases at least, it is the incidence rate that is looked at most closely — if you can make progress on the incidence rates the prevalence problem will take care of itself eventually. Meanwhile, prevalence can be unreliable — a deadly disease has less prevalence because it is killing people faster (and taking them off the books) — just as the unemployment rate does not include people who have stopped looking. On the other side of the equation, prevalence can often under-represent positive change which can be dwarfed by a large backlog of cases. If we start to make progress on diabetes, for example, it won’t be easily seen in a prevalence chart until many, many years later.

There are other problems with the charts, absolutely. I’m not calling the game for Obama here — that chart tells lies in some other ways. The chart does not control for population growth — a couple hundred thousand jobs are needed just to account for new people coming into the economy. A lot of the positive looking growth is treading water.

And I may have just massacred econometrics — I don’t know. I am sure they have some of their own terms that deal with these things. But I think these sorts of approaches should be at the heart of an integrative statistics course, encouraging people to try insights from one domain and seeing if they have explanatory power in another.