Does More Books Mean More Titles or More Editions? (A critique of that graph going around)

This has been one of the most interesting charts of the week, but it is also one generating a lot of wrong pronouncements I think:

The buzz around this is it shows the influence of copyright — and it definitely does — far less of the 2500 books sampled come from the period of copyright. But the question is what sort of effect of copyright it is demonstrating. For instance, I’ve seen almost all the commentators suggest this indicates that there is a massive gap in our copyright era offerings — the claim is copyright is making titles much less available.

But that’s not necessarily the case. It all comes down to what you mean by title.

Meaning this — when something is in copyright it is published usually by one publisher — maybe a couple publishers if there are overseas agreements. If it’s an absolute classic,  there may be more, but not that many. There are three Kindle versions of Hemingway’s For Whom the Bell Tolls on the U.S. Amazon site, and one is in Bulgarian, and another in Portuguese. There are five paperback versions listed as “new”, and only one of them actually appears to be in print currently.

On the other hand, there appear to be almost a hundred Kindle versions of Jane Eyre, each with its own ISBN. Go to paperback, and there are 400 versions of Jane Eyre. There’s 298 hardcovers of it.

And it’s not just popular works — Eliot’s forgotten masterpiece Silas Marner has 301 versions, whereas Wolfe’s 1980s classic Bonfire of the Vanities has three.

Want to really freak out? There are almost 5,000 “new” editions of the work of Dickens available. (Again,  these searches are including some out of print works in mint condition — I can’t seem to filter these out — but the point holds). You’d have to lump-sum a city’s worth of single-publisher authors for several years to get to a figure like that.

I can’t see any way that you could conceivably control for this in a random sample, at least given how Amazon’s search is constructed,  so I’m going to assume it wasn’t controlled for — in which case the graphic tells us nothing at this point. Copyright may also be reducing availability of titles — it would make sense that it was, to some extent. But this graph doesn’t tell you anything about that.

Comparison of the Day: Barefoot Running

A decent point about comparison that’s often missed: comparing like-to-like means that interventions must be executed at the same level of proficiency as controls:

For the past few years, proponents of barefoot running have argued that modern athletic shoes compromise natural running form. But now a first-of-its-kind study suggests that, in the right circumstances, running shoes make running physiologically easier than going barefoot.

The study, conducted by researchers at the University of Colorado in Boulder, began by recruiting 12 well-trained male runners with extensive barefoot running experience. “It was important to find people who are used to running barefoot,” says Rodger Kram, a professor of integrative physiology, who oversaw the study, which was published online in the journal Medicine & Science in Sports & Exercise.

“A novice barefoot runner moves very differently than someone who’s used to running barefoot,” Dr. Kram says. “We wanted to look at runners who knew what they were doing, whether they were wearing shoes or not.”

Specifically, he and his colleagues hoped to determine whether wearing shoes was metabolically more costly than going unshod. In other words, does wearing shoes require more energy than going barefoot?

You see this a lot in educational research — the teachers involved are either more trained in the intervention or the control, which can foul the results quite a bit, even in a cross-over design.

There’s actually lots more great stuff in this article — what the researchers found was that the lack of the weight of shoes was actually a confounding variable in judging the efficiency of other aspects of barefoot running — basically the like-to-like comparison they designed compared ultralight running shoes to barefoot + small weighted band-aids, and once the variable of shoe weight was controlled for in this way the efficiency association was reversed…another reminder that it’s usually more about the definitions than the stats.

I should add that this study probably addresses the concerns of only a small amount of barefoot runners — not everybody cares about efficiency.

Blackboard, Moodle, and the Commodity LMS

I haven’t seen this graph referenced in the recent discussion around Blackboard’s latest purchase, which is strange, because it explains almost everything:

A while back, Blackboard decided that the saturation and commodification of the LMS market meant that the path to greater profitability was not more contracts, but a higher average contract price. Under such a model, Blackboard Basic was seen as cannibalizing potential sales of the enterprise product, and they began running a series of special offers to move people off of the basic product and into the enterprise one. And they were successful to a point. In the data we have, enterprise licenses increase, and total licenses fall slightly, indicating that at least initially the higher contracts may have offset customer loss (though even this is debatable given the deals they ran for upgrades, and the percentage of that bump that is really the Angel acquisition).

We can’t tell what happened after that, though, since Bb has not released data on licensing. The best estimate I’ve seen indicates that Bb lost between 150-400 licenses a year from 2009 through 2011.

That’s a problem for Blackboard, and not just because of the loss of core LMS revenue. Why? Because Blackboard sees the future of contracts as selling add-on modules and other higher education services. Take a look at their front page:

The Learn product is placed first, but the message is clear — the transaction system, analytics system, and campus messaging system are products in their own right. And the future is selling these products — if you don’t believe me, just look at these figures Michael Feldstein put together a couple years back:

I’m sorry this is out of date — since Blackboard went private in 2011, there has not been much data released about such matters, so we have to rely on old snapshots.  But the idea here is clear:

  • Use the LMS as a foot in the door, and make a profit off the enterprise version of it
  • Purchase other companies to get a foot in the door to sell the add-on services

The purchase of ANGEL and WebCT were not about winning the “LMS wars”. Blackboard sees the LMS at this point as a commodity product. The purchases were about getting a seat at the table to sell other more profitable products: analytics, collaboration add-ons, early warning systems, financial transaction systems. Compared to the LMS, the price to support cost of an analytics,  transaction, or emergency communication system is a dream. The LMS, on the other hand, is a headache — high support, low-margin. But it’s a perfect foot in the door to make other sales.

There was only one problem with this — as Blackboard acquired more customers via purchases of WebCT and Angel, they realized there was a leak — Moodle. Customers that were gained through acquisition were moving to Moodle, and current customers pressured into the enterprise product were also bailing.

Buying Moodlerooms plugs that leak, and keeps Blackboard at the table selling the more profitable building access, commerce,   human resources, and donor management systems that they make their money on.

What Moodle is to Blackboard is a way to keep a foot in the door of the more price-sensitive customers while not cannibalizing sales of Learn.  Ultimately, this allows them to maximize profit on the LMS side while staunching the customer bleed they have been experiencing the past three years. And this preserves what has been their planned path for a while — to move beyond what they see as a dead LMS market and into more profitable products in less saturated areas.

Blackboard bought Moodlerooms for the same reason it bought Angel and WebCT: it’s not an LMS company anymore. It’s really that simple.

Comparison of the Day: Gas Prices

When things have a seasonal cycle, it’s often difficult to make direct comparisons. Ideally you compare to last year this time, or the ten year average of this time last year, but what people really want is a sense of how high it will go. This article does a decent job with that — look, it’s already $3.90, it’s going to keep rising until May or so — it’s hard to see absent some large event how we won’t at least break last years peak of $3.98, and get close to the record of $4.11.

This is also an example of where controlling for inflation is probably superfluous. We’re talking against last years price on the one hand and a date four years back on the other, through some pretty lean years in terms of inflation — we can probably live without it in a presentation like this. (Although interestingly we were almost paying as much for gas in 1980 in inflation adjusted dollars as we were the past couple of years — a recession will do that to prices….)

That Millennial Study and Baselines

By now you’ve seen or heard about the APA study on millennials and civic-mindedness. Turns out that millenials are not as civic-minded as Howe and others have claimed. Fair enough.

But another thing caught my eye — all the stories tended to compare Millennial numbers to a baseline Boomer figure — leading to everyone to blame the self-focus on coddling parents and Barney songsmithing.

But if you look at the figure above, you’ll see the jump only continues the radical jump made by the Gen Xers. And the Gen Xers were the latchkey generation.

So all these explanations that finger a cause located in a 1990’s childhood? Think again. There’s definitely something interesting going on here, but it’s been going on a lot longer than most articles indicate…

The Golden Rule of Comparison and the ACA

The golden rule of comparison, we tell our students, is simple:

Compare like-to-like where this is possible; account for differences where it is not.

Honestly, if you just apply this one rule religiously to anything billed as a comparison, you’ll outperform most people in evaluating comparisons.

Case in point, the Congressional Budget just published an update of its analysis of the Affordable Care Act. In the document they state:

CBO and JCT now estimate that the insurance coverage provisions of the ACA will have a net cost of just under $1.1 trillion over the 2012–2021 period—about $50 billion less than the agencies’ March 2011 estimate for that 10-year period [Emphasis mine].

That’s a good comparison. They are comparing the 2012-2021 estimate they made previously to the new estimate for 2012-2021. It’s the same agency, and we assume it’s the same analytical framework, but with updated data. It’s like to like as you tend to get in life.

Yet this was the response from Tom Price, of the House Republican Policy Committee:

House Republican Policy Committee Chairman Tom Price, M.D. (R-GA) issued the following statement regarding the Congressional Budget Office’s (CBO) updated cost estimate of the president’s health care law. The new CBO projection estimates that the law will cost $1.76 trillion over 10 years – well above the $940 billion Democrats originally claimed.

Why this discrepancy? There’s multiple reasons. But I find this one, which Ezra Klein points out,  is the most interesting:

One other thing that’s confused some people is that this estimate is looking at a different timeframe than the original estimates. The CBO’s first pass at the bill looked at 2010-2019. But years have passed, and so now they’re looking at 2012-2021. That means they have two fewer years of implementation, when the bill costs almost nothing, and two more years of operation, when it costs substantially more.

The idea is that since the ACA doesn’t *really* kick in until 2014, a 2010-2019 estimate is a 6-year cost, and a 2012-2021 estimate is an 8-year cost. There are other issues as well, perhaps even more important, but it occurs to me that this is a pretty common parlor trick people play with numbers a lot.

As far as how to solve the question of whether the 2010 estimate was high or low, Ezra correctly suggests the easiest way to do it is to ignore totals and look at the revisions. The net effect of the revisions identified by the CBO is negative — the bill costs less than initially thought.

Compare like-to-like where this is possible; account for differences where it is not.

Plenary Workshop at NELIG: What is Critical Thinking, and Why Is It So Hard to Teach?

I call this a plenary workshop, but as I learned after I agreed to do it, it was not only a plenary session, but it was the only session. Apparently NELIG, at least in its quarterly meetings, is structured as one giant workshop. No pressure there, then… 😉

In any case, I think it worked out (The abstract is here). This was a reformulation of some of the material covered in the Critical Skills Workshop over break, but redirected to issues of information literacy. If there’s one big idea in it, it’s that when we think critically we don’t often do the computation-intense sort of processing we tend to conceptualize as critical thinking. The most important pieces of critical thinking (as practiced in daily life) happen before you start to “think” — they come from the conceptual frameworks that formulate our intuitive responses. To address problems in critical thought, you have to understand the conceptual frameworks in use by students, work with the student to actively deconstruct them, and provide more useful frameworks to replace them.

If you can’t do that — if your idea is that the students will just learn to think harder — you’re lost.

The participants were great — actively engaged, great thinkers asking all the right questions. I want library faculty in all my presentations from now on: you really can’t do better. In the activity, they identified the differences between the conceptual frameworks librarians use to parse results lists, and the frameworks used by students — students use “familiarity” and “match” as their guideposts — to them, the act of choosing a resource is like that of choosing a puzzle piece. Librarians look at genre and bias — what sort of document is this (journal article, news story, conference proceeding, blog post) and what markers of bias can we spot (URL, language, title, etc). For librarians, this is an exercise of seeking out construction materials, not finding puzzle pieces.

We talked a little about how to students these processes may appear the same: librarians talk about bias, and students hear “use familiar sources”. Librarians talk about genre, and students hear “fit” or “match” — “How many journal articles do I need to collect? How many news stories?”, which is really just a different way of asking what shape the puzzle piece should be in. Until you address the underlying conceptual misunderstanding directly through well-structured activities, students will continue to plug what you teach them into a conceptual framework that undermines the utility of the new knowledge.

Slides are here. There’s some good stuff in there, but much is incomprehensible without the activities and narration.

To all NELIG participants, thanks for a great Friday morning. It was a pleasure to talk with you all!

Comparing Porn Prosecutions

One of the things I like about the COMPARABLE framework is how nicely it can be used not only to evaluate existing comparisons, but to think through what a fair comparsion would look like where none is provided. For instance, today I saw this:

“Well you have to look at the proof that’s in the prosecution. Under the Bush administration, pornographers were prosecuted much more rigorously under existing law than they are under the Obama administration,” Santorum said. “My conclusion is they have not put a priority on prosecuting these cases, and in doing so, they are exposing children to a tremendous amount of harm. And that to me says they’re putting the un-enforcement of this law and putting children at risk as a result of that.”

The first thing is the habit of mind — when a student sees the word “more”, hopefully that triggers comparison mode. Honestly, getting that bell to ring is the hardest bit of this. Once we are there, we use our framework:

C: Comparison groups are the prosecutions under Bush vs under Obama. Fair enough.

O: Santorum talks about rigor, and priority — but the key claim here seems to be “un-enforcement” so the best variable seems like it might be number of prosecutions. But we’d also probably have to find some way to take into account severity of crime. A small prosecution on a fine-able offense should not equal a large prosecution with jail terms, etc.

M: Mental experiment on this is hard, since there are no numbers to run through. So we’ll skip it.

P: Again, without numbers, there’s nothing to do here.

A: We probably want to look at this not only from raw numbers, but as a percent of the DOJ’s total effort. There might also be other factors. Since there are limited resources of the DOJ, any large operation in a non-porn area that requires the same people might make resources scarce. If we do raw numbers, we probably also want to make sure we are taking into acount Obama has only been in office three years — a good comparison might be the last three years of the Bush administration to the first three years  of Obama. Finally, we might look at action controlled for the size of the porn industry — a bigger industry might require more regulation.

R: Randomness is not such an issue here as we’re not sampling. But there might be some year to year randomness in the number of prosecutable offenses.

A: Alternative measures might include looking at this as a trend. For example, were prosecutions declining year over year in the Bush administration, a decline under Obama might be a continuation of a historical trend. If they were increasing under Bush, even a stabilization under Obama would look like a redirection of resources.

B: Base rates — I’m not sure what relevant base rates are here. Again, we don’t have numbers. But obviously understanding whether a percentage increase or decrease is meaningful will require absolute numbers and an idea of how prosecutions relate to offenses. It might also be useful to see if the Bush administration is the historical anomaly here.

L: This is a longitudinal comparison. It might be interesting to go the other direction too, and look at how a country like 2011 Canada prosecutes this stuff.

E: Not sure how distribution effects this, although subpopulations is obvious — we’d want to look at how different types of crime account for the whole of prosecutions in each administration — the likelihood is that that breakdown would tell us far more than the headline statistic of prosecutions.

One thing to notice about COMPARABLE is it avoids going directly down the association rabbit-hole — in this case the “kids exposed to harm” piece. While that’s certainly important, I find that those questions end up being too nuanced for many of our students. The comparison question here is complex, certainly, but there is a certain concreteness to it that is helpful to the beginning student of QR.