Arsonist Birds Activity Sequence

I have a new and complete three-part activity sequence up on the Four Moves blog. It asks students to delve into whether a story about birds intentionally setting fires is an accurate summary of research cited. It goes through three steps:

  • Evaluating the reporting source.
  • Evaluating the research source.
  • Checking for distortion of the source material.

I won’t go into the steps because I don’t want to foul up the Google Search results for this activity. But I encourage you to start at the beginning and go through the three activities. (Please note that it is meant to be used in a classroom with a facilitator.)

One of the things I’ve built out a bit more in this set of activities is the “skills up front, social impact on the back” approach. Each activity runs the students through very specific skills, but then asks the students to reflect a bit more deeply on the information environment as a whole. Here are some discussion questions we asks students after they check whether the National Post is a “real” newspaper:

  • Neither the National Post nor the reporter has any core expertise in ethno-biology. So why do we trust the National Post more than a random statement from a random person?
  • Why do you think that newspapers have such a good reputation for truthfulness and care compared to the average online site? What sort of economic incentives has a newspaper historically had to get things right that a clickbait online site might not have had?
  • How do we balance our need for traditional authoritative sources with our desire to include diverse voices and expertise? How do we make sure we are not excluding valuable online-only sources? What are the dangers of a newspaper-only diet of news?

And here is a question we ask after we have the students read the arsonist birds article — which is really about science having ignored indigenous and professional expertise:

  • One question the journal article raises is the way that professional and indigenous expertise is not always valued by science. How can we, as people seeking the best information, value academic research while respecting non-academic expertise when appropriate? What’s a good example of when professional or indigenous expertise on an issue might be preferable to academic expertise?

This stuff takes forever to put together, unfortunately, because one thing we’re trying to do is be very careful about tone, and make sure we get students to think about the incentives around information production without allowing them the easy shortcut of cynicism. We also are quite aware that the biggest worry we face is not that students will consume disinformation, but that they may consume no civically-oriented news at all. So in other sections we use the follow-up to make the case for considered and intentional news consumption (and again, news consumption that is less focused on political hobbyism).

In any case, I think it’s a solid sequence, and I hope you’ll try going through it. It uses a password for the “solutions” as a way to rebuff search engines and slow students down. The password is “searchb4share”. Try it out!

 

Advertisements

People Are Not Talking About Machine Learning Clickbait and Misinformation Nearly Enough

The way that machine learning works is basically this: you input some models, let’s say of what tables look like, and then the code generates some things it thinks are tables. You click yes on the things that look like tables and the code reinforces the processes that made those and makes some more attempts. You rate again, and with each rating the elements of the process that produce table-like things are strengthened and the ones that produce non-table-like things are weakened.

It doesn’t have to be making things — it can be recognition as well. In fact, as long as you have some human feedback in the mix you can train an machine learning process to recognize and rate tables that another machine learning process makes, in something called a generative adversarial network.

People often use machine learning and AI interchangeably (and sometimes I do too). In reality machine learning is one approach to AI, and it works very well for some things and not so well for others. So far, for example, it’s been a bit of a bust in education. It’s had some good results in terms of self-driving cars. It hasn’t done great in medicine.

It will get better in these areas but there’s a bit of a gating factor here — the feedback loops in these areas are both delayed and complex. In medicine we’re interested in survival rates that span from months to decades — not exactly a fast paced loop — and the information that is currently out there for machines to learn from is messy and inconclusive. In learning, the ability to produce custom content is likely to have some effect, but bigger issues such as motivation, deep understanding, and long-term learning gains are not as simple as recognizing tables. In cars machine learning has turned out to be more useful, but even there you can use machine learning to recognize stop signs, but it’s a bit harder to test the rarer and more complex instances of “you-go-no-you-go” yielding protocols.

You know what machine learning is really good at learning, though? Like, scary, Skynet-level good?

What you click on.

Think about our tables example, but replace it with headlines. Imagine feeding into a machine learning algorithm the 1,000 most shared headlines and stories, and then having the ML generate over the next hour 10,000 headlines that it publishes by 1,000 bots. The ones that are successful get shared and those parts of the ML net are boosted (produce more like this!). The ones that don’t get shared let the ML know to produce less along those lines.

That’s hour one of our disinfo Skynet. If the bots have any sizable audience, you’re running maybe 20,000 tests per piece of content — showing it to 20,000 people and seeing how they react. Hour two repeats that with better content. By the next morning you’ve run millions of tests on your various pieces of content, all slowly improving the virality of the material.

At that scale you can start checking valence, targeting, impact. It’s easy enough for a network analysis to show whether certain material is starting fights for example, and stuff that starts fights can be rated up. You can find what shares well and produces cynicism in rural counties if you want. Facebook’s staff will even help you with some of that.

In short, the social media audience becomes one big training pool for your clickbait or disinfo machine. And since there is enough information from the human training to model what humans click on, that process can be amplified via generative adversarial networks, just like with our tables.

It doesn’t stop there. The actual articles can be written by ML, with their opening grafs adjusted for maximum impact. Videos can be automatically generated off of popular articles and flood YouTube.

Even the bots can get less distinguishable. An article in the New York Times today details the work being done in ML face generation, where believable fake faces are generated. Right now the process is slow, partially because it relies solely on GAN, and because it’s processor intensive. But imagine generating out a 1,000 fake faces for your bot avatars and tracking which ones get the most shares, then regenerating a thousand more based on that and updating. Or even easier, autogenerating and re-generating user bios.

You don’t even need to hand-grow the faces, as with the NYT article. You could generate 1.000 morphs, or combos of existing faces.

Just as with the last wave of disinformation the first adopters of this stuff will be the clickbait farms, finding new and more effective means to get us to sites selling dietary supplements, or watch weird autogenerated YouTube videos. There will be a flood of low-information ML-based content. But from there it will be weaponized, and used to suppress speech and manipulate public opinion.

These different elements of ML-based gaming of the system have different ETAs, and I’m not saying all of this is imminent. Some of it is quite far off. But I am saying it is unavoidable. You have machine learning — which loves short and simple feedback loops — and you have social media, which has a business model and interface built around those loops. The two things fit together like a lock and a key. And once these two things come together it is likely to have a profoundly detrimental effect on online culture, and make our current mess seem quite primitive by comparison.

 

 

 

Some 2018 Predictions

wrote a prediction a couple weeks ago for Nieman Lab, and it was general and media literacy focused. But here’s some more mundane, somewhat U.S.-centric predictions:

  • Social media overrun by AI. AI’s main influence in the coming year will not be in driving cars but in driving state-sponsored social bots and corporate astroturfing. The ability of AI to manufacture real looking discourse and news will render “representativeness” heuristics for misinformation obsolete.
  • Revenge porn in politics. Revenge porn — using both real and fake photos — has been used in multiple international elections. We saw a hint of what can happen here with the Joe Barton incident, and media failed that test. But prepare for much more strategic use, along the lines of what we saw recently in Rwanda. Prepare for at least one U.S. 2018 race to have a revenge porn “scandal”, maybe more. Also, look to AI — with its ability to modify video and images — to play a possible role. (Also, can we come up with a different name for this trend? Revenge porn is a specific form of malinformation/misinformation, but “porn” is wrong since titillation is a mechanism, but the intent is personal destruction and silencing of opponents).
  • Hacked voter registration rolls. Some state voter registration systems will be hacked in an effort to create chaos and long lines at the polls, depressing the vote and calling the results of elections into question. Social media bots and armies will play both sides of this issue — with the mismatches portrayed as voter fraud on one side and suppression on the other.
  • Rise of the civil servant email dump. The press were willingly complicit in the Wikileaks email dump, covering and sifting through stolen personal, non-state emails to amplify a Russian disinfo campaign. Until we come up with a “right to privacy” for state officials and workers this will be a profitable vein. Imagine dumping the personal emails of a state-level elections official where they talk mean about one of the candidates, or share an anti-Trump meme with their spouse. Or an anti-Democratic meme. Apply this to a judge faced with adjudicating a case where the government is a plaintiff, or anyone who threatens your agenda.
  • Big Oil trolling. A major petrochemical company will be revealed to be using troll armies to create confusion around climate change, perhaps with some state help.
  • Creation of pro-government social media army focused domestically. My most out-there prediction. President Trump will announce the creation of a “Fake News Commission” to investigate both journalists and social media. One finding of the committee will be that the U.S. needs to emulate other countries and create an army of social media users to seek out anti-government “fake news” and “correct” it. (There is some precedent for the U.S. doing this in other countries, but we have never done it to our own population at any serious scale).
  • Weaponization of #MeToo. The #MeToo movement is an important and long overdue movement. But because of its disparate impact on the left it will be weaponized, and used to try to fracture the left. This will be one of the big uses of bots and trolling in the election, second maybe to voter fraud and voter suppression flame-fanning. Look for the rise of the #MeToo concern troll, the supposed “lifelong Democrat” who was “ready to vote for X” but these rumors “have disgusted” them. They’ll have names ending with eight random digits. Expect AI and microtargeting to play a role here, making it difficult for candidates to respond, especially to rumors they don’t know exist.

I don’t know what to do about this last problem. It’s knotty. I’m not the person to write about how we prepare as a society for this, but I wish someone would.

Larger Trends

Looking at these predictions, which I typed just now, stream-of-consciousness style, I’m struck by the two social vulnerabilities they exploit.

Industrialization of conversation. We have not come to terms with how the digitalization of conversation allows for its industrialization. And how it’s industrialization allows for manipulation that is more massive and immediate than what we’ve previously seen in the conversational space. We need to develop tools and norms to protect conversation from industrialization. And we desperately need to stop conceptualizing the discourse space on the web as a bunch of individual actors expressing an emergent will.

The erosion of the right to privacy. The modern expectation of privacy, if I recall correctly, was a result of both urbanization and literate culture. In a small tribe you don’t have much privacy, but on the other hand everyone knows you and has the context to evaluate new information about you. In a rural setting, most of what you do is relatively undiscoverable by others, unless someone involved talks. But in towns and cities people’s actions are both discoverable and shorn of context,  and written communication is similarly stripped of its setting and intended interlocutors, so new norms and laws were needed.

(Economics played a part here of course as well — consider the way that works of Rembrandt and others portray homes of the rising middle class as quiet and meticulously organized private spaces apart from the bustle of the street.)

Internet giants such as Google and Facebook have labelled privacy as a historical anomaly. And it’s true that the modern conception of privacy seems to emerge with the development of the modern middle class. But there are some things to note here.

The first is that privacy is necessitated by a move to literate culture. The nature of verbal communication is that it is by its very nature private, only available to those at a specific time and place. Written communication makes possible the broad dissemination of messages intended for different audiences and contexts, and so a notion of informational privacy has to develop. The mailman doesn’t get to read my mail, and has never had the right to do that, from the invention of mailmen. This is because the notion of mail gives rise to new notions of mail privacy. Mail doesn’t make privacy obsolete — the norms and the tech co-develop.

Looked at this way the move from literate to digital culture should not reduce the amount of privacy available to people, but increase the realms where the concept is applied. Our literate and productively bureaucratic culture could not have developed without the expansion of privacy norms around written communication. My Dad worked for Digital Equipment Corporation for many years, an IBM-like mid-20th century creation that functioned on memos and notes and written analysis. Had it been legal and acceptable for any person to go out and sell internal memos to other companies, or to publish employee assessments in the local paper, very little work could have been done on paper, and organizations like IBM and Digital simply would have been impossible to run. They would have collapsed. The invention of the written memo occasions new norms about the privacy of the written memo.

The move to digital communication should, likewise, prompt new and more restrictive norms around privacy.   Otherwise our digitally enabled culture will collapse, completely unworkable. But what’s different this time is the business model of our modern system involves the mailman reading our mail, so powerful interests have spent a lot of time arguing that rather than prompt new notions of privacy, technology undoes privacy. This is unbelievably stupid and technocentric. The invention of sexting, for example, doesn’t “undo” privacy — it argues for an expansion of the concept. The use of email doesn’t mean that everyone’s annual reviews will now be a matter of public record, or that we now all have a right to read the personal work squabbles of others — it means we need to develop new norms and laws and security expectations about email.

I’m sure that the powers that be in Silicon Valley believe in “the end of privacy”, just like they believe in technocratic meritocracy. The most attractive thing for any programmer to believe is that new technologies will render the messiness of social relations obsolete. But this idea, that privacy is antiquated, will lead to institutional and organizational collapse on a massive scale, which is why a transparency organization like Wikileaks is the favorite tool of dictators.

Additionally, unless privacy concerns are addressed, we will end up reversing the advances of the literate culture which allowed broad participation in discourse and decision-making. Keep in mind that while people become increasingly wary of speaking frankly in email, text, and chat rooms because of the lack of technical security and ephemerality people with face-to-face access to power will be able to speak freely. It’s easy to mock bureaucratic culture with its emails, and memos, and endless reply-to-alls. But when the only way to influence the direction of the company reverts to being seated in a chair across from the CEO we will miss it.

That’s a long point one on privacy. But let me add point two. The invention of an intensely internal and personal private life is one of the great gifts of modernity to humanity. I love Shakespeare, but read a soliloquy of Shakespeare’s next to Wordsworth’s Tintern Abbey. Reflecting on seeing a place he has not seen for a long time, Wordsworth writes:

 These beauteous forms,
Through a long absence, have not been to me
As is a landscape to a blind man’s eye:
But oft, in lonely rooms, and ‘mid the din
Of towns and cities, I have owed to them,
In hours of weariness, sensations sweet,
Felt in the blood, and felt along the heart;
And passing even into my purer mind
With tranquil restoration:—feelings too
Of unremembered pleasure: such, perhaps,
As have no slight or trivial influence
On that best portion of a good man’s life,
His little, nameless, unremembered, acts
Of kindness and of love.

People focus on Wordsworth’s treatment of nature, which is remarkable in itself. But the most striking thing to me about the poem has always been how recognizable the psychology is here compared to Shakespeare. And a piece of that is the way in which the narrator’s personal and private life provides sustenance even as the “din of towns and cities” drains him. The way in which even his mental experience of those “lonely rooms” is intensely and unapologetically personal. The obsession with the psychological reality of mental imagery. You see this same development with the novels of the Brontës and Jane Austen, and in portraiture. Matisse’s Woman Reading, for example, sits with her back to us, the room of hers small and unkempt, but she is transported via reading into a different world via the book she reads.

I’ve spent too many words here already on this post, so I’ll pursue this another time. It’s difficult to explain. But the notion of privacy is more than just social and organizational lubricant — over the past 500 years or so we’ve built it deep into the notion of what it means to be human, and removing it dehumanizes us.

Warmly,

Mike

Using Google News to Verify Claims

When you’re confronted with a news claim you want to verify, you have a lot of options. Generally, the first move of our four move method is to look for previous work. Find a fact-check or a reliable article from a local or well-resourced publication that’s already done the verification for you.

The easiest way to do that, especially with breaking news, is to use the select and search browser option. Select relevant text, right-click (or command-click) to get a context menu, and then select the “search” option.

When the search opens in a new tab, choose the news tab (available in both Google and Bing). This gives you a curated stream of news to choose from, and provides some markers of credibility as well, showing you what a local source is and marking in-depth treatments. Here’s a screencast to show how its done:

There are things to watch here. Google News provides a much higher quality set of news sources than general search, but not everything in Google News is trustworthy. Google News still indexes conspiracy sites such as World Net Daily, for instance. Additionally, when foreign newspapers report American stories (or when American papers report foreign stories) special care should be taken: cultural notions sometimes don’t transfer, and foreign press often misunderstand hoax sites as true American news, as in this case where an Indian newspaper repeats a hoax debunked by Snopes months ago:

morgue.PNG

Users also have to be careful of opinion columns on traditional news sites. There are many reliable reporting sources that have opinion pages with little to no verification process in place. Here’s an example of what to be careful of:

taxplan

There’s two New York Times items here, but they are completely different in type, and for all intents and purposes from two different sources. The first one is an opinion column, and the second is a straight news item. In Google they look exactly the same. If you’re used to this stuff, you can get a good idea from the tone of the snippet which is which, but if you’re new to this you probably have to click through.

opinion

A good traditional news source — such as the Wall Street Journal or the Washington Post — will make clear which items are opinion and which are reporting. Many non-traditional sources will not (one benefit of using traditional sources for verification is the hard line they draw between editorial and news reporting).

All of these caveats might sound a bit distressing, but even with these caveats, using Google News to check news claims is going to filter out 95% of the junk. For many verification tasks it’s the best first move.

Using Google News to Verify Older Claims

Google News is a good place to verify older claims as well, or claims where you are unsure of the relevant time-frame.

As an example, here’s an item that floated into my Pinterest feed today:

sackhoff

Is this true? Is it new?

If you go to Google, type in Katee Sackhoff, search then and hit the news tab, you’ll find out that it is at least true that Sackhoff *said* this in 2013:

sackhoff2

You can actually scan enough snippets here to get an idea of what happened. Reliable sources say Sackhoff said she lost half her followers, less reliable sources such as WorldNetDaily and Guns.com validate the “half” claim directly in their headlines, without stating Sackhoff was the source of the claim and the claim was not verified.

wnd2.PNG

This is why it’s still important to choose sources from the feed wisely, read the keyword-in-context snippets when available, and if necessary click through to the article.

In this particular case, the precision of words turns out to be important, since as Reason.com notes in an update to their erroneous story that Sackhoff appears to have been making a joke and did not lose many followers at all:

UPDATE: Looks like Sackhoff was kidding when she said she lost half her followers. Twitter stats show she didn’t take a net hit. She’s actually up a few followers today. A Sackhoff fan emails to say “Katee jokes a lot.”

(As a side note, one indicator of source reliability is whether they issue corrections after claims they made are discovered to be false. Seeing which outlets bothered to correct this story and which didn’t might make a good class activity.)

Using Google News to Verify a Source as “Real”

As noted, Google News contains some dubious sources, and a lot of unverified or weakly verified content in the form of opinion columns and slanted news. In Google News, you will find some conspiracy sites, many opinion columns, and lots of headlines that outright lie.  Google News does not, however, contain a lot of hoax sites; you won’t find a source claiming to be a local paper that isn’t, or publications making up stories out of nothing.

As such, You can use a Google News search for a baseline check on whether a publication is “real” or “fake”.

I show two examples of this in the video below:

Again, this is a quick and dirty check. A more detailed check might involve searching Snopes, following the story to the source, or looking up the publication in Wikipedia.

A Note on the Two Faces of Google News

The desktop version of Google news has two interfaces: an older “News Archives” version and a newer “Breaking News” interface. Here’s what they look like.

The “News Archive” view:

rupert murdoch1

The new “Reader” view, rolled out in July 2017:

rupert murdoch2.PNG

The big differences are the reduction is clutter, the use of a card-based interface, the better highlighting of local and in-depth coverage, and better paths to related content, whether through fact-checks or topical tags. For people browsing the news, the new interface does a better job of flagging expertise and exposing people to diverse perspectives.

While the “reader” interface provides a better reading and browsing experience it provides a bad experience for verification. There are no snippets of keywords in context, there’s no access to date filters, and old content is not available through the interface.

Here, for example, is what we get when we search for the DART officer story in the “News Reader” interface:

DART

This is because the event, which happened two weeks ago, is already is already too old for the “reader” view. If you click the Google News Archive link, it will take you to both the older interface and the older news articles.

The reader view also is missing a number of important tools for verification that we’ll talk about using later, like full date range filtering and keyword in context.

If you use the select-and-right-click method we show here, you should end up at the “news archive” view which is what you want. If you end up in the news reader by mistake, your best move is to go to google.com, make your search, and click the news tab. In general use the reader view for browsing recent news, but avoid it when using Google News for verification purposes.

The Web Is Abundant. Find Another Source.

 

I do a lot of work that I don’t cover here — in particular, I’m slowly putting together curriculum for the American Democracy Project on what the Stanford History Education Group calls Civic Online Reasoning. (I don’t show a lot of this work here because anything I publish on this blog alters the search results for the exercises and makes them less authentic.)

But as I’ve put together the exercises and tried to refine the UbD-style understandings I’m trying to hit, I keep finding one of the biggest understandings is what I call the “Abundant Web” assumption. Put simply, the web is qualitatively different than most information environments because of its abundance, but our processes still tend to economize as if information on the web was scarce.

What do I mean by this? Look at something that came to me today via a Twitter link.

It shows a unique sort of sun halo that supposedly appeared in Sweden. People are filming it — at least it looks like they are. So it probably isn’t just some sort of fake produced by a weird filter, right?

So how do you check this? Your first thought might be “Who is Massimo?” Or maybe you click the link and trace it up to Facebook where you find it was posted on a page called Severe Weather Europe, and we should look into them. They credit the video to a Twitter user named @vemdalen, which is a design company in Sweden. Who is this vemdalen, and what do they…

But I actually don’t do any of this. I follow the first rule of our Web Literacy for Student Fact-Checkers, and I check for previous work:

And I find an article in the Independent about it. The Independent is not the best newspaper anymore — it’s been severely degraded with clickbait over the years. But it’s still a newspaper, and when we go there we find that they are reporting this as fact and that they have linked an article explaining the phenomenon from a science website.

If we were a reporter, and this was a story we were working on, this probably wouldn’t be enough for us. But for a citizen trying to not retweet lies, it’s enough. And we get there in 90 seconds partially because we assume that on an abundant web if this thing really happened someone somewhere probably already looked into it.

There’s also the question of how we choose the Independent as our source. I know the Independent because over time I’ve seen it as a source for things and built up a mental model of what it does well and what it does poorly. Eventually all students should start to know a few resources like this — dry land that they can swim to when looking for a source. (Or in the case of the Independent a moderately squishy bog).

But you don’t have to know the source to do this. One of the prime techniques I use is searching for stories in Google News versus general Google search. Why? Because Google News is curated — sources are selected based on qualifying as real news sites. So what we’re doing when we search on Vemdalen, Sweden and click on Google News is we’re saying “Let’s start over on this, and try to get a news source that has at least a modicum of vetting applied to it.” It’s not going to be perfect: Google News makes mistakes. Sometimes news publications also have non-news content (promos, editorials) which are not held to the same standard as the rest of the source. And some of the sources they include just plain shouldn’t be included. But it filters out 98% of the junk for us on a task like this.

Again, not the level of precision we’d want as a reporter or scholar. But for a citizen, it’s probably good enough, as long as they are taught to use it correctly.

And that’s the other side of “The Web Is Abundant”.  In a world with 100s of possible sources, so much of what you do is less about finding coverage than about limiting it through filters. Here we limit it by use of a curated news site. But because of the principle of abundance, we could be picky in other ways if we were looking at a different sort of story — we could filter by location, looking for local coverage. We could filter by date, either looking for the most recent developments, or the earliest possible reporting.

All of this is markedly different than what we tell students in our world of print scarcity. With print, there are few sources directly available to us, and to find and acquire new resources takes at least minutes and sometimes weeks. When you have a source in front of you you don’t throw it out, you interrogate it. The economics of this are different on the web, where lack of commitment to a source is a virtue, and we get to the truth more quickly by always assuming there’s a better source out there. We trade one resource for another without even bothering to read the first one. We need a media literacy that makes a virtue of this lack of commitment to initial resources rather than a fetish of investigative persistence with them.