60-Second Check: Aircraft Waste Hits Cruise Ship

When I say you can fact check a lot of things in one to two minutes, I mean, literally, one to two minutes. Here’s an example:


You can sit around and think critically about whether this is possible all day, of course. But the easiest way to debunk this is to discover that the pictures are lifted from a different context, and to do that you need web skills, not what we traditionally call “critical thinking”.

Information Underload

For many years, the underlying thesis of the tech world has been that there is too much information and therefore we need technology to surface the best information. In the mid 2000s, that technology was pitched as Web 2.0. Nowadays, the solution is supposedly AI.

I’m increasingly convinced, however, that our problem is not information overload but information underload. We suffer not because there is just too much good information out there to process, but because most information out there is low quality slapdash takes on low quality research, endlessly pinging around the spin-o-sphere.

Take, for instance, the latest news on Watson. Watson, you might remember, was IBM’s former AI-based Jeopardy winner that was going to go from “Who is David McCullough?” to curing cancer.

So how has this worked out? Four years later, Watson has yet to treat a patient. It’s hit a roadblock with some changes in backend records systems. And most importantly, it can’t figure out how to treat cancer because we don’t currently have enough good information on how to treat cancer:

“IBM spun a story about how Watson could improve cancer treatment that was superficially plausible – there are thousands of research papers published every year and no doctor can read them all,” said David Howard, a faculty member in the Department of Health Policy and Management at Emory University, via email. “However, the problem is not that there is too much information, but rather there is too little. Only a handful of published articles are high-quality, randomized trials. In many cases, oncologists have to choose between drugs that have never been directly compared in a randomized trial.”

This is not just the case with cancer, of course. You’ve heard about the reproducibility crisis, right? Most published research findings are false. And they are false for a number of reasons, but primary reasons include that there are no incentives for researchers to check the research, that data is not shared, and that publications aren’t particularly interested in publishing boring findings. The push to commercialize university research has also corrupted expertise, putting a thumb on the scale for anything universities can license or monetize.

In other words, there’s not enough information out there, and what’s out there is generally worse than it should be.

You can find this pattern in less dramatic areas as well — in fact, almost any place that you’re told big data and analytics will save us. Take Netflix as an example. Endless thinkpieces have been written about the Netflix matching algorithm, but for many years that algorithm could only match you with the equivalent of the films in the Walmart bargain bin, because Netflix had a matching algorithm but nothing worth watching. (Are you starting to see the pattern here?)

In this case at least, the story has a happy ending. Since Netflix is a business and needs to survive, they decided not to pour the majority of their money into newer algorithms to better match people with the version of Big Momma’s House they would hate the least. Instead, they poured their money into making and obtaining things people actually wanted to watch, and as a result Netflix is actually useful now. But if you stick with Netflix or Amazon Prime today it’s more likely because you are hooked on something they created than that you are sold on the strength of their recommendation engine.

Let’s belabor the point: let’s talk about Big Data in education. It’s easy to pick on MOOCs, but remember that the big value proposition of MOOCs was that with millions of students we would finally spot patterns that would allow us to supercharge learning. Recommendation engines would parse these patterns, and… well, what? Do we have a bunch of superb educational content just waiting in the wings that I don’t know about? Do we even have decent educational research that can conclusively direct people to solutions? If the world of cancer research is compromised, the world of educational research is a control group wasteland.

We see this pattern again and again — companies coming along to tell us that their platform will help us with the firehose of content. But the big problem is not that it’s a firehose, but that it’s a firehose of sewage. It’s all haystack and no needle. And the reason this happens again and again is that what we so derisively call “content” nowadays is expensive to produce, and gets produced by a large number of well-paid people who in general have no significant marketing arm. To scale up that work is to employ a lot of people, but it doesn’t change your return on investment ratio. To make a dollar, you need to spend ninety cents, and that doesn’t change no matter how big you get. And who wants to spend ninety cents to make a dollar in today’s world?

Processing and promotion platforms, however, like Watson or MOOCs or Facebook, offer the dream of scalability, where there is zero marginal cost to expansion. They also offer the potential of monopoly and lock-in, to drive out competitors. And importantly, that dream drives funding which drives marketing which drives hype.

And this is why there is endless talk about the latest needle in a haystack finder, when what we are facing is a collapse of the market that funds the creation of needles. Netflix caught on. Let’s hope that the people who are funding cancer research and teaching students get a clue soon as well. More money to the producers of valuable content. Less to platforms, distributors, and needle-finders. Do that, and the future will sort itself out.

I’m guessing if you are reading this you already know this, but if you are interested in this stuff, make sure to read Audrey Watters’ This Week In Robots religiously, as  well her writing in this area, which has been very influential on me.



We’ve Made It Ridiculously Easy to Contribute to the Digital Polarization Initiative

The idea of the fact-checking activity in the Digital Polarization Initiative is simple: civic education as public work.

The education piece is simple: students learn how to tell truth from fiction on the web through checking our claims and investigating questions. We have a short textbook on that that they can read in a week. They can then apply those skills to political questions or questions within their discipline.

The public work piece is important though as well. It’s not enough for students to tell truth from fiction on the web (a personal skill). They should also use that skill to make the world a better place for others. So we encourage students to contribute the results of their investigations to a public wiki. Some of the answers our students have provided on various issues already rank near the top of Google queries for related questions.

This is digital public work, not just in the sense it is publicly available, but in the sense that it contributes to a public commons to advance a public good. In this case, that public good is to collectively improve the information environment of the web.

How much tech do you need to know or master to participate? Here’s the thing: outside what you’ll need to learn about fact-checking (Google features, checking sources, following links) you need to know almost nothing.

If you want your class to make the world a better place, the current “Digipo” process is to talk to me. I’ll set your class up a directory in Google Docs that is pre-connected to the wiki generation code. Students edit the documents in that directory and they automatically get changed to wiki pages. The video below shows it in action.

That’s it. I can’t begin to tell you what a pain in the butt it was to set this up. But the pain is behind us now, and there really is no excuse for not participating. The learning curve for contribution is a flat line, which will let you concentrate on the fact-checking parts. Contact me at michael.caulfield@wsu.edu if you want to participate.

How to Find Out If a “Local” Newspaper Site Is Fake (Using a New-ish Google Feature)

As you may know, one of the great innovations of the 2016 election season was the use of fake “local” papers like the Denver Guardian to spread fake news:


The Denver Guardian, as we all now know, was a completely fake site that only published this single page. The page was shared on Facebook over a half a million times and became one of the most shared stories of the final weeks of the election.

This technique was relatively new to the political realm in 2016, but had been a staple of a bunch of “stranger than fiction” fake stories before 2016.

So how do you know if the local paper you’re looking at is really a real paper from that area? The recommendation of the Snopes folks at a recent misinformation conference was to approach it from the other direction — grab a list of papers for that state or city and see if it is on it.

They used Wikipedia, but it looks like Google has made it even easier than that. Because if you type in [denver newspapers] into Google, this is what you get:


It’s a bit small, but if you can see the image above there are 13 papers associated with Denver, and none of them is named the “Guardian”. For local papers, this search move now provides a 30 second check you can execute to check the veracity of a story.

It works for many non-American cities as well:


Though some revert to the Wikipedia list:


Incidentally, I hope we see more of this sort of thing from Google. I want better search results and credibility indicators, but I also want simple tools for the 30-second researcher. And this is exactly what those tools should look like. Whether this was intentional or not I don’t know, but this tool takes the current practice of fact-checkers, and makes it easily accessible to a general public.

More please. Perhaps a siteinfo: term next? Please?