Arsonist Birds Activity Sequence

I have a new and complete three-part activity sequence up on the Four Moves blog. It asks students to delve into whether a story about birds intentionally setting fires is an accurate summary of research cited. It goes through three steps:

  • Evaluating the reporting source.
  • Evaluating the research source.
  • Checking for distortion of the source material.

I won’t go into the steps because I don’t want to foul up the Google Search results for this activity. But I encourage you to start at the beginning and go through the three activities. (Please note that it is meant to be used in a classroom with a facilitator.)

One of the things I’ve built out a bit more in this set of activities is the “skills up front, social impact on the back” approach. Each activity runs the students through very specific skills, but then asks the students to reflect a bit more deeply on the information environment as a whole. Here are some discussion questions we asks students after they check whether the National Post is a “real” newspaper:

  • Neither the National Post nor the reporter has any core expertise in ethno-biology. So why do we trust the National Post more than a random statement from a random person?
  • Why do you think that newspapers have such a good reputation for truthfulness and care compared to the average online site? What sort of economic incentives has a newspaper historically had to get things right that a clickbait online site might not have had?
  • How do we balance our need for traditional authoritative sources with our desire to include diverse voices and expertise? How do we make sure we are not excluding valuable online-only sources? What are the dangers of a newspaper-only diet of news?

And here is a question we ask after we have the students read the arsonist birds article — which is really about science having ignored indigenous and professional expertise:

  • One question the journal article raises is the way that professional and indigenous expertise is not always valued by science. How can we, as people seeking the best information, value academic research while respecting non-academic expertise when appropriate? What’s a good example of when professional or indigenous expertise on an issue might be preferable to academic expertise?

This stuff takes forever to put together, unfortunately, because one thing we’re trying to do is be very careful about tone, and make sure we get students to think about the incentives around information production without allowing them the easy shortcut of cynicism. We also are quite aware that the biggest worry we face is not that students will consume disinformation, but that they may consume no civically-oriented news at all. So in other sections we use the follow-up to make the case for considered and intentional news consumption (and again, news consumption that is less focused on political hobbyism).

In any case, I think it’s a solid sequence, and I hope you’ll try going through it. It uses a password for the “solutions” as a way to rebuff search engines and slow students down. The password is “searchb4share”. Try it out!



People Are Not Talking About Machine Learning Clickbait and Misinformation Nearly Enough

The way that machine learning works is basically this: you input some models, let’s say of what tables look like, and then the code generates some things it thinks are tables. You click yes on the things that look like tables and the code reinforces the processes that made those and makes some more attempts. You rate again, and with each rating the elements of the process that produce table-like things are strengthened and the ones that produce non-table-like things are weakened.

It doesn’t have to be making things — it can be recognition as well. In fact, as long as you have some human feedback in the mix you can train an machine learning process to recognize and rate tables that another machine learning process makes, in something called a generative adversarial network.

People often use machine learning and AI interchangeably (and sometimes I do too). In reality machine learning is one approach to AI, and it works very well for some things and not so well for others. So far, for example, it’s been a bit of a bust in education. It’s had some good results in terms of self-driving cars. It hasn’t done great in medicine.

It will get better in these areas but there’s a bit of a gating factor here — the feedback loops in these areas are both delayed and complex. In medicine we’re interested in survival rates that span from months to decades — not exactly a fast paced loop — and the information that is currently out there for machines to learn from is messy and inconclusive. In learning, the ability to produce custom content is likely to have some effect, but bigger issues such as motivation, deep understanding, and long-term learning gains are not as simple as recognizing tables. In cars machine learning has turned out to be more useful, but even there you can use machine learning to recognize stop signs, but it’s a bit harder to test the rarer and more complex instances of “you-go-no-you-go” yielding protocols.

You know what machine learning is really good at learning, though? Like, scary, Skynet-level good?

What you click on.

Think about our tables example, but replace it with headlines. Imagine feeding into a machine learning algorithm the 1,000 most shared headlines and stories, and then having the ML generate over the next hour 10,000 headlines that it publishes by 1,000 bots. The ones that are successful get shared and those parts of the ML net are boosted (produce more like this!). The ones that don’t get shared let the ML know to produce less along those lines.

That’s hour one of our disinfo Skynet. If the bots have any sizable audience, you’re running maybe 20,000 tests per piece of content — showing it to 20,000 people and seeing how they react. Hour two repeats that with better content. By the next morning you’ve run millions of tests on your various pieces of content, all slowly improving the virality of the material.

At that scale you can start checking valence, targeting, impact. It’s easy enough for a network analysis to show whether certain material is starting fights for example, and stuff that starts fights can be rated up. You can find what shares well and produces cynicism in rural counties if you want. Facebook’s staff will even help you with some of that.

In short, the social media audience becomes one big training pool for your clickbait or disinfo machine. And since there is enough information from the human training to model what humans click on, that process can be amplified via generative adversarial networks, just like with our tables.

It doesn’t stop there. The actual articles can be written by ML, with their opening grafs adjusted for maximum impact. Videos can be automatically generated off of popular articles and flood YouTube.

Even the bots can get less distinguishable. An article in the New York Times today details the work being done in ML face generation, where believable fake faces are generated. Right now the process is slow, partially because it relies solely on GAN, and because it’s processor intensive. But imagine generating out a 1,000 fake faces for your bot avatars and tracking which ones get the most shares, then regenerating a thousand more based on that and updating. Or even easier, autogenerating and re-generating user bios.

You don’t even need to hand-grow the faces, as with the NYT article. You could generate 1.000 morphs, or combos of existing faces.

Just as with the last wave of disinformation the first adopters of this stuff will be the clickbait farms, finding new and more effective means to get us to sites selling dietary supplements, or watch weird autogenerated YouTube videos. There will be a flood of low-information ML-based content. But from there it will be weaponized, and used to suppress speech and manipulate public opinion.

These different elements of ML-based gaming of the system have different ETAs, and I’m not saying all of this is imminent. Some of it is quite far off. But I am saying it is unavoidable. You have machine learning — which loves short and simple feedback loops — and you have social media, which has a business model and interface built around those loops. The two things fit together like a lock and a key. And once these two things come together it is likely to have a profoundly detrimental effect on online culture, and make our current mess seem quite primitive by comparison.