It Can Take As Little As Thirty Seconds, Seriously

 

I talk about 90-second fact-checks and I think people think I’m a bit unhinged sometimes. What can students possibly do in that short amount of time that would be meaningful?

A lot, actually.

For example, this press release on some recent research was shared with me today:

eureka alert.PNG

Now I want to re-share this with people, but I’d like to be a good net citizen as well. Good net citizens:

  • Source-check what they share
  • Share from the best source possible
  • Provide source/claim context to people they share with when necessary

To do that in this case we need to get to the source of the press release, on a site controlled by the American Psychological Association directly, and share that version of this. We also need to check that the American Psychological Association is the credible organization that we think it is. How long will this take?

Literally thirty seconds, if you know how to do it:

  • Select the headline, search on it.
  • First result up is from apa.org, that looks promising
  • Go there, look to make sure it’s the same release
  • Search Wikipedia for the site address. Find the article on the APA.
  • Check to make sure the APA is a real organization.
  • Check to make sure the APA web address matches

And you’re done. That may sound like a lot of steps, but each one is simple, fast, and fluid. Here are those steps executed in real time (video intentionally silent). I really encourage you to watch the video to see how ridiculously easy this is for someone with some training.

There’s really no excuse not to do this for things you share. It not only allows you to share from a more authoritative source, which is good for society and the economics of publishing, but it allows you provide your readers helpful context. Compare this:

aaas.PNG

To this:

apa2.PNG

I used to focus on students on writing longer research pieces on issues, and we still do that in various classes we work with. But just this behavior alone improves the world:

  • Check what you share
  • Share from the better source
  • Provide a context blurb to share your own source verification with others

You don’t need to write an essay. And most any student (or teacher!) can learn the techniques. Think of it as information hygiene, the metaphorical handwashing you engage in to prevent the spread of misinformation.

Learn the skills and make the world a better place. There may be good excuses for not doing this, but time is not one of them.

(Oh, and here’s that APA press release — it’s really interesting!)

 

Instead of letting people vote on news, Facebook should adopt Google’s rater system

A message I sent to a newsgroup on Facebook’s recent proposal that users could rate sites as a solution. To my surprise, I find myself suggesting they should follow Google’s model, which, while often faulty, is infinitely better than what they are proposing.

==============

(Regarding the announcement), I think there’s a better, time-tested way of doing this that doesn’t deal with individual ratings but benefits from expert analysis and insight. Use a modified version of the Google system.

Most people misunderstand what the Google system looks like (misreporting on it is rife) but the way it works is this. Google produces guidance docs for paid search raters who use them to rate search results (not individual sites). These documents are public, and people can argue about whether Google’s take on what constitutes authoritative sources is right — because they are public.

The raters rate search quality off the documents, and coders try to code to get the score up, but the two pieces are separate.

It’s not a perfect system, but it provides so many things this Facebook proposal doesn’t even touch:

  • It defines a common set of standards as to what a “good” result looks like, without going into specific sources.
  • It provides a degree of public transparency that Facebook doesn’t even come close to
  • It provides incentives for publishers to act in ethical ways, e.g. high quality medical or financial advice (“your money or your life” categories) must be sourced to professionals, etc.
  • It separates the target from the method of assessing whether the target is being hit
I’m not saying it doesn’t have problems — it does. It has taken Google some time to understand the implications of some of their decisions and I’ve been critical of them in the past. But I am able to be critical partially because we can reference a common understanding of what Google is trying to accomplish and see how it was falling short, or see how guidance in the rater docs may be having unintended consequences.

In such a system Facebook would hire raters who would rate feed quality — not individual sites — for a variety of criteria which experts have decided is characteristic of quality news feeds (and which readers by and large agree with). That would probably mean ascertaining whether sites included in the feed had the following desirable attributes:

  • Separation of opinion and analysis and news content, with opinion in particular clearly marked
  • Sponsored content clearly marked, and comprising small portion of overall site
  • Syndicated content clearly identified
  • Satire pieces marked unmistakably as satire
  • News stories clear about the process and methods by which they verified news in an article (e.g. “Kawczynski declined to be interviewed Sunday, but in posts on his website and on Gab…”)
  • A retraction policy, and an email address to send noted errors to
  • Headlines that match in tone and meaning the content of the attached article
  • Descriptive blurbs in Facebook that accurately describe the content of the article
  • Pictures which are either related to the event or marked as stock or file footage with descriptive and accurate captions
  • Links where appropriate to supporting coverage from other news outlets
  • A clear and accurate about page which defines who runs the paper
  • A lack of plagiarism — e.g. does the content pass the “Google paste test” for material not marked as syndicated.
Raters would rate the quality of the sources showing up in their feed and Facebook engineers would work on improving feed quality by getting the ratings up. No one gets banned or demoted by name. Or promoted by name.
The place of experts and the public would be to clarify what they trust in news. In fact, the Trust Project has already done much of the work that would go into feed quality rating docs. I summarize their work and my simplification of it here:
Again, the rater guidance documents get published. We continue to argue over whether the guidance is correct and whether the implementation is meeting the guidance or being gamed. We still raise holy hell about misfires and get them to rethink guidance and code.
The approach Facebook is currently proposing, on the other hand, is essentially nihilistic, and like many nihilistic things it may have current utility (and may even work temporarily), but provides a lousy foundation for dealing for problems to come.
Mike.

P.S. By and large I think you will find both that the public would rather trust expert opinion on what  constitutes quality than trust their neighbor, and that the public more or less agrees — both right and left — with the practices in the bulleted list above.

Arsonist Birds Activity Sequence

I have a new and complete three-part activity sequence up on the Four Moves blog. It asks students to delve into whether a story about birds intentionally setting fires is an accurate summary of research cited. It goes through three steps:

  • Evaluating the reporting source.
  • Evaluating the research source.
  • Checking for distortion of the source material.

I won’t go into the steps because I don’t want to foul up the Google Search results for this activity. But I encourage you to start at the beginning and go through the three activities. (Please note that it is meant to be used in a classroom with a facilitator.)

One of the things I’ve built out a bit more in this set of activities is the “skills up front, social impact on the back” approach. Each activity runs the students through very specific skills, but then asks the students to reflect a bit more deeply on the information environment as a whole. Here are some discussion questions we asks students after they check whether the National Post is a “real” newspaper:

  • Neither the National Post nor the reporter has any core expertise in ethno-biology. So why do we trust the National Post more than a random statement from a random person?
  • Why do you think that newspapers have such a good reputation for truthfulness and care compared to the average online site? What sort of economic incentives has a newspaper historically had to get things right that a clickbait online site might not have had?
  • How do we balance our need for traditional authoritative sources with our desire to include diverse voices and expertise? How do we make sure we are not excluding valuable online-only sources? What are the dangers of a newspaper-only diet of news?

And here is a question we ask after we have the students read the arsonist birds article — which is really about science having ignored indigenous and professional expertise:

  • One question the journal article raises is the way that professional and indigenous expertise is not always valued by science. How can we, as people seeking the best information, value academic research while respecting non-academic expertise when appropriate? What’s a good example of when professional or indigenous expertise on an issue might be preferable to academic expertise?

This stuff takes forever to put together, unfortunately, because one thing we’re trying to do is be very careful about tone, and make sure we get students to think about the incentives around information production without allowing them the easy shortcut of cynicism. We also are quite aware that the biggest worry we face is not that students will consume disinformation, but that they may consume no civically-oriented news at all. So in other sections we use the follow-up to make the case for considered and intentional news consumption (and again, news consumption that is less focused on political hobbyism).

In any case, I think it’s a solid sequence, and I hope you’ll try going through it. It uses a password for the “solutions” as a way to rebuff search engines and slow students down. The password is “searchb4share”. Try it out!

 

People Are Not Talking About Machine Learning Clickbait and Misinformation Nearly Enough

The way that machine learning works is basically this: you input some models, let’s say of what tables look like, and then the code generates some things it thinks are tables. You click yes on the things that look like tables and the code reinforces the processes that made those and makes some more attempts. You rate again, and with each rating the elements of the process that produce table-like things are strengthened and the ones that produce non-table-like things are weakened.

It doesn’t have to be making things — it can be recognition as well. In fact, as long as you have some human feedback in the mix you can train an machine learning process to recognize and rate tables that another machine learning process makes, in something called a generative adversarial network.

People often use machine learning and AI interchangeably (and sometimes I do too). In reality machine learning is one approach to AI, and it works very well for some things and not so well for others. So far, for example, it’s been a bit of a bust in education. It’s had some good results in terms of self-driving cars. It hasn’t done great in medicine.

It will get better in these areas but there’s a bit of a gating factor here — the feedback loops in these areas are both delayed and complex. In medicine we’re interested in survival rates that span from months to decades — not exactly a fast paced loop — and the information that is currently out there for machines to learn from is messy and inconclusive. In learning, the ability to produce custom content is likely to have some effect, but bigger issues such as motivation, deep understanding, and long-term learning gains are not as simple as recognizing tables. In cars machine learning has turned out to be more useful, but even there you can use machine learning to recognize stop signs, but it’s a bit harder to test the rarer and more complex instances of “you-go-no-you-go” yielding protocols.

You know what machine learning is really good at learning, though? Like, scary, Skynet-level good?

What you click on.

Think about our tables example, but replace it with headlines. Imagine feeding into a machine learning algorithm the 1,000 most shared headlines and stories, and then having the ML generate over the next hour 10,000 headlines that it publishes by 1,000 bots. The ones that are successful get shared and those parts of the ML net are boosted (produce more like this!). The ones that don’t get shared let the ML know to produce less along those lines.

That’s hour one of our disinfo Skynet. If the bots have any sizable audience, you’re running maybe 20,000 tests per piece of content — showing it to 20,000 people and seeing how they react. Hour two repeats that with better content. By the next morning you’ve run millions of tests on your various pieces of content, all slowly improving the virality of the material.

At that scale you can start checking valence, targeting, impact. It’s easy enough for a network analysis to show whether certain material is starting fights for example, and stuff that starts fights can be rated up. You can find what shares well and produces cynicism in rural counties if you want. Facebook’s staff will even help you with some of that.

In short, the social media audience becomes one big training pool for your clickbait or disinfo machine. And since there is enough information from the human training to model what humans click on, that process can be amplified via generative adversarial networks, just like with our tables.

It doesn’t stop there. The actual articles can be written by ML, with their opening grafs adjusted for maximum impact. Videos can be automatically generated off of popular articles and flood YouTube.

Even the bots can get less distinguishable. An article in the New York Times today details the work being done in ML face generation, where believable fake faces are generated. Right now the process is slow, partially because it relies solely on GAN, and because it’s processor intensive. But imagine generating out a 1,000 fake faces for your bot avatars and tracking which ones get the most shares, then regenerating a thousand more based on that and updating. Or even easier, autogenerating and re-generating user bios.

You don’t even need to hand-grow the faces, as with the NYT article. You could generate 1.000 morphs, or combos of existing faces.

Just as with the last wave of disinformation the first adopters of this stuff will be the clickbait farms, finding new and more effective means to get us to sites selling dietary supplements, or watch weird autogenerated YouTube videos. There will be a flood of low-information ML-based content. But from there it will be weaponized, and used to suppress speech and manipulate public opinion.

These different elements of ML-based gaming of the system have different ETAs, and I’m not saying all of this is imminent. Some of it is quite far off. But I am saying it is unavoidable. You have machine learning — which loves short and simple feedback loops — and you have social media, which has a business model and interface built around those loops. The two things fit together like a lock and a key. And once these two things come together it is likely to have a profoundly detrimental effect on online culture, and make our current mess seem quite primitive by comparison.