When Percentages Go Wrong

A poor man said to a rich one: “All my money goes for food.”

“Now that’s your trouble,” said the rich man. “I only spend five percent of my money on food.”

(From a Sufi tale, recounted here.)

Percentages are a really helpful tool, obviously. But raw numbers can matter too.

Comparison of the Day: Conservative vs. Liberal Trust of Science

From Kevin Drum’s Chart o’ the Day:

Lots of interesting stuff going on there. Notice, in particular, how the trust in science falls off a cliff for moderates in the 70s. It’s also fascinating that conservative trust in science used to be as high (if not higher) than that of liberals 40 years ago. This is still the case in Europe — for the most part there is no liberal/conservative divide in trust in science.

It gets even more interesting when you look at the subpopulations. You might think, for instance, that the decline in conservative belief in science has been driven by shifts in the attitudes of the least educated conservatives. Nope:

Less-educated conservatives didn’t change their attitudes about science in recent decades. It is better-educated conservatives who have done so, the paper says.

In the paper, Gauchat calls this a “key finding,” in part because it challenges “the deficit model, which predicts that individuals with higher levels of education will possess greater trust in science, by showing that educated conservatives uniquely experienced the decline in trust.” This finding also could make it difficult to change attitudes. Gauchat writes that the educational attainment data suggest “that scientific literacy and education are unlikely to have uniform effects on various publics, especially when ideology and identity intervene to create social ontologies in opposition to established cultures of knowledge (e.g., the scientific community, intelligentsia, and mainstream media).”

Comparison of the Day: CFL vs. Incandescent Mercury Pollution

From EnergyStar.gov:

Lifecycle impact is an invaluable tool in making fair comparisons.  It’s easy, for example, to get hung up on the small amount of mercury in a CFL bulb, a percentage of which can escape into the environment if the bulb is crushed in a landfill.

But the biggest contributor to mercury pollution is coal-fired plants, which push gigantic amounts of mercury into the environment as part of their normal operations.

So how do we compare the mercury impact of the two different bulbs? We calculate how much mercury is produced via electricity over the lifetime of the bulb (here standardized to 8000 hrs. of use, since CFLs last longer). Then we add the mercury in the bulb itself to the lifetime use figure. It seems obvious, and it’s certainly a common way to do it — but it’s an incredibly powerful way to look at things compared to the alternatives.

Comparison of the Day: BMI/Mortality J-Curve

My new favorite term from epidemiology: J-Curve.

There’s a lot of things that increase your mortality in a more-or-less linear way. The more you smoke, the greater your all-cause mortality risk, for example. This isn’t to say you increase your chance of death by 100% moving from one pack a day to two. But on average, your mortality goes up for each additional cigarette you smoke a day. Ten cigarettes is not going to be better for you than five, ever.

Some things, though, don’t work like that. It’s harmful to be overweight, but it’s harmful to be underweight too. Some studies claim alcohol is like this — having no alcohol correlates with a higher mortality than having a drink or two a day, but once you get past a drink or two a day mortality climbs again. The curve is shaped like a “J”, hence the name.

Understanding that things can work this way is important. Vitamin E deficiencies have been correlated with increased cancer mortality, so a lot of people take vitamin E supplements, assuming it’s a linear relationship. But vitamin E supplements have been correlated with increased cancer risk.

Likewise, a lot of health gurus today will point to the harmful effects of over-consumption of sugar, gluten, or dairy (or heck, even fat/oils), and act as though this proves elimination of this thing will dramatically increase your health. It might — if it is a linear relationship. But if it’s a J-curve, you could end up doing as much harm as good.

Pro-Privacy Viruses

The Silicon Valley conception of privacy isn’t working for anyone except Silicon Valley. We know that. Charlie Stross, who is one smart dude, points out that if you follow the corporate-driven push to overshare to its logical conclusion your phone becomes a handy-dandy genocide machine, or, in the near term, the perfect device for this year’s Rufie-carrying girl stalker. Moreover, this is not some bizarre side-effect of social software, but is a flaw built into to how the software thinks about you, the product it is serving up to others.

That seems shrill and alarmist, but lately I don’t think it is. There are a lot of benefits to sharing, but also a lot of drawbacks, as any college grad who has missed out on a job due to a red solo cup picture can tell you. And because we get our media from the entities that came up with this system, we tend to see the benefits as systemic and the downsides as localized. But think about that for a minute or two and you realize that that can’t possibly be right.

Anyway, I’ve been thinking how it all ends lately. I don’t think it ends with us all running our own open source servers, going off the corporate surveillance grid. I don’t think we’ll be switching to Diaspora. We’re locked into these services.

So what’s the next vector? I think what we’ll be seeing soon are pro-privacy viruses. Imagine a “benevolent virus” that, instead of keylogging your credit card number, resets all your Facebook settings to the most private settings and sets your homepage to instructions for reopening up permissions (if that’s what you want to do). Or a virus that sits resident in memory and corrupts cross-site tracking cookies in real-time. Or one that shows you every bit of information that is retreivable about you on the internet, and asks if you are good with that.

I don’t think these should be created — there’d be a lot of unforseen side effects. But I think they are coming, and I think they are more likely to have a broader impact on privacy than scattered DIY projects.

In the end, I imagine they will fail — but it will be an interesting phase of this drama…

Does More Books Mean More Titles or More Editions? (A critique of that graph going around)

This has been one of the most interesting charts of the week, but it is also one generating a lot of wrong pronouncements I think:

The buzz around this is it shows the influence of copyright — and it definitely does — far less of the 2500 books sampled come from the period of copyright. But the question is what sort of effect of copyright it is demonstrating. For instance, I’ve seen almost all the commentators suggest this indicates that there is a massive gap in our copyright era offerings — the claim is copyright is making titles much less available.

But that’s not necessarily the case. It all comes down to what you mean by title.

Meaning this — when something is in copyright it is published usually by one publisher — maybe a couple publishers if there are overseas agreements. If it’s an absolute classic,  there may be more, but not that many. There are three Kindle versions of Hemingway’s For Whom the Bell Tolls on the U.S. Amazon site, and one is in Bulgarian, and another in Portuguese. There are five paperback versions listed as “new”, and only one of them actually appears to be in print currently.

On the other hand, there appear to be almost a hundred Kindle versions of Jane Eyre, each with its own ISBN. Go to paperback, and there are 400 versions of Jane Eyre. There’s 298 hardcovers of it.

And it’s not just popular works — Eliot’s forgotten masterpiece Silas Marner has 301 versions, whereas Wolfe’s 1980s classic Bonfire of the Vanities has three.

Want to really freak out? There are almost 5,000 “new” editions of the work of Dickens available. (Again,  these searches are including some out of print works in mint condition — I can’t seem to filter these out — but the point holds). You’d have to lump-sum a city’s worth of single-publisher authors for several years to get to a figure like that.

I can’t see any way that you could conceivably control for this in a random sample, at least given how Amazon’s search is constructed,  so I’m going to assume it wasn’t controlled for — in which case the graphic tells us nothing at this point. Copyright may also be reducing availability of titles — it would make sense that it was, to some extent. But this graph doesn’t tell you anything about that.