Data Voids and the Google This Ploy: Kalergi Plan

If you want to see how data voids are utilized by extremists, here’s a good example. Last night a prominent conservative organization tweeted this image:

Picture of a group of conservative activists. One holds a beach ball that says “Google Kalergi Plan.”

You see the beach ball, right? It asks you to Google the “Kalergi Plan”. What’s that? It’s an anti-Semitic conspiracy theory that has its roots in “the ‘white genocide’ and ‘great replacement’ conspiracy rhetoric in far-right circles, which allege that a secret ruling class of Jewish elites is using immigration policy to remove European white people from the population.”

It’s garbage, and it’s dangerous garbage. Specifically, it’s the sort of garbage that motivated both the shooter in the Tree of Life massacre and the shooter in the Christchurch shootings.

But what happens when you Google this term?

Results of Kalergi Plan search on Google. At the top of the page are three white supremacist videos pushing conspiracy theories with a history of promoting violence and murder.

What you see immediately are the videos. These videos, like a lot of conspiracy videos, take a little known footnote in history and place it center stage. Kalergi, of course, is historically real. But he is also a historical figure of little note and no current influence. As such, there isn’t much writing on him, except (and here’s the main thing) by those who have put him at the center of a fake conspiracy.

So what you get is what researchers call a “data void“: people who know anything about the history of Europe, immigration, etc. don’t talk about Kalergi, because he is insignificant, a figure most notable for the conspiracy theories built around him. But people using the conspiracy theory talk about Kalergi quite a lot. So when you search Kalergi Plan, almost all the information you get will be by white supremacist conspiracy theorists.

These bad actors then use the language of critical thinking to tell you to look at the evidence and “make up your own mind.”

Screenshot of YouTube video saying “Is the Kalergi Plan Real? Make up your own mind.”

But of course if you’re searching “Kalergi plan”, most of the “evidence” you are getting comes from white supremacist conspiracy theorists. Making up your own mind under such a scenario is meaningless at best.

Things used to be much worse up until a few months ago, because if you watched one of these videos, YouTube would keep playing you conspiracy videos on the “Kalergi Plan” via a combination of autoplay, recommended videos, and personalization. It would start connecting you to other videos on other neo-Nazi theories, “race science”, and the like. People would Google a term once and suddenly find themselves permanently occupying a racist, conspiracy driven corner of the internet. Fun stuff.

Due to some recent actions by YouTube this follow-on effect has been substantially mitigated (though their delay in taking action has led to the development of a racist-conspiracist bro culture on YouTube that continues to radicalize youth). The tamping down of the recommended video conspiracy vector isn’t perfect, but it is already having good effects. However, it’s worth noting that reducing the influence of this vector has probably increased the importance of Google This ploys on the net, since people are less likely to hit these videos without direct encouragement.

What can we do as educators? What should we encourage our students to do?

1. Choose your search terms well

First, let students know that all search terms should be carefully chosen, and ideally formed using terms associated with the quality and objectivity of the information you want back. Googling “9/11 hoax” is unlikely to provide you reliable information on the 9/11 attacks, as people who study 9/11 don’t refer to it as a hoax. In a similar vein, “black on white crime”, the term that began the radicalization of the Charleston church shooter, is a term used by many neo-Nazis but does not feature prominently in academic analysis of crime patterns. Medical misinformation is similar — if you search for information on “aborted fetuses” in vaccines when there are not aborted fetuses in vaccines the people you’re going to end up reading are irresponsible, uneducated kooks.

Selected Google results for “Aborted fetuses in vaccines”. It’s a tire fire.

This isn’t to say that a better search term gets you great results, especially around issues that are conspiracy-adjacent. But a better search term may at least return a set of results with some good pages listed. Here are the top results for a bad search ([[“aborted fetuses in vaccines”]] on the left), and ([[stem cells vaccines]]) on the right.

Search result screenshots.

Note the differences (reliable sources are highlighted). With the loaded terms on the left, the top two results are from unreliable sources. However, the less loaded search returns better results. In addition to seeing some scholarly articles with the better terms (a possible-though-not-foolproof indicator you are using better language) the second item here is not only a reliable resource on this issue, but one of the best comprehensive explanations of the issue written for the general public, from an organization that specializes in the history of medicine. Search on the loaded terms, however, and you will not see this, even in the first fifty results.

2. Search for yourself

Conspiracy theorists are fond of asking people to “think for themselves” — after those people use the suggested conspiracy-inflected search terms to immerse themselves in a hall of mirrored bullshit. A better idea might be to do less thinking for yourself and more searching for yourself. When you see signs or memes asking you to search specific terms, realize that the person asking you to do that may be part of a community that has worked to flood the search results for that term with misinformation.

When we say “search for yourself” we do not mean you should use terms that return information that matches your beliefs. We mean that you should think carefully about the sort of material you want to find. If you wish to find scholarly articles or popular presentations of scholarly work, choose terms scholars use. If you are interested in the history of the Europe’s current immigration policies, search for “history of europe’s immigration policies”, not “Kalergi Plan”. Don’t be an easy mark. There’s nothing more ridiculous than a person talking about thinking for themselves while searching on terms they were told to search on by random white supremacists or flat-earthers on the internet.

A final note — for the moment, avoid auto-complete in searches unless it truly is what you were just about to type. Auto-complete often amplifies social bias and for niche terms it can be gamed in ways that send folks down the wrong path. It’s not so bad when searching for very basic how to information or the location of the nearest coffee shop, but for research on issues of social controversy or conflict it should be avoided.

3. Anticipate what sorts of sources might be in a good search — and notice if they don’t show up

Before going down the search result rabbit hole, ask yourself what sort of sources would be the most authoritative to you on those issues and look to see if those sorts of pages show up in the results. There’s a set of resources I’ve grown used to seeing when I type in a well-formed medical query — the Mayo Clinic, WebMD, the American Cancer Society, the National Institutes of Health. (as an example, look at “melatonin for sleep” as a search). When looking for coverage of national events I’ve grown use to seeing recognizable newspapers — the Washington Post, the Los Angeles Times, the Wall Street Journal.

Students don’t necessarily have the ability to recognize these sorts of sources off the bat, but they should cultivate the habit of noticing the results that turn up in a well tuned query, and the sources that turn up in a data void, such as the “death foods” term you may occasionally see in website sponsored ad chumbuckets. Initially this understanding may be more about genre than specific touchstones — expecting newspapers to show up for events, hospitals and .gov sites for medical searches, magazine or journal treatments of policy issues.

The important thing, however, is anticipation. Does the student have at least a vague expectation of the sorts of sources they expect to see before they hit the search button. If they develop the habit of forming these informal expectations they are more likely to reanalyze search terms when those expectations are violated.

Network Heuristics

There’s a story going around right now about a “reporter” who was following people shorting Tesla stock and allegedly approaching them for information. I won’t go into the whole Elon vs. the Short Sellers history, you don’t need it. Let’s just say that posing as a reporter can be used for ill in a variety of ways and maybe this was a case of that.

Snapshot of Maisy Kinsley’s profile

The way a lot of people judge reputation is signals, the information a person chooses to project about themselves. Signals can be honest or dishonest, but if a person is new to you you may not be able to assess the honesty of the signal. What you can do, however, is assess the costliness of the signal. In the case of a faker, certain things take relatively little time and effort, but others take quite a lot.

Let’s list the signals, and then we’ll talk about their worth.

First there’s the Twitter bio and the headshot. The headshot is an original photo — a reverse image search here doesn’t turn up Maisy, but it doesn’t turn up anyone else — it’s less likely to be a stolen photo. The Twitter bio says she’s written for Bloomberg.

This isn’t that impressive as verification, but wait! Maisy also has a website, and it looks professionally done!

Maisy’s website

From the website we learn that she’s a freelancer. Again, user supplied, but she links to her LinkedIn page. She’s got 194 connections, and is only 3 degrees of separation from me! (I’m getting a bit sick of this photo, but still).

LinkedIn profile.

Oh, and she went to Stanford! Talk about costly, right? You don’t do that on a whim!

Screenshot of education panel in LinkedIn

The Usual Signals Are Increasingly Garbage

Here’s the thing about all the signals here: they are increasingly garbage, because the web drives down the cost of these sorts of things. And as signals become less costly they are less useful.

Your blurb on Twitter is produced directly by you — it’s not a description on a company website or in a professionals directory. So, essentially worthless.

That photo that’s unique? In this case, it was generated by machine learning, which can now generate unique pictures of people that never existed. It’s a process you can replicate in a few seconds at this site here, as I did to generate a fake representative and fake tweet below.

The website? Domains are cheap, about $12 a year. Website space is even cheaper. The layout? Good layout used to be a half-decent signal that you’d spent money on a designer — fifteen years ago. Nowadays, templates like this are available for free.

LinkedIn, though, right? All those connections? The education? I mean, Stanford, right?

First, the education field in LinkedIn is no more authoritative than the bio field in Twitter. It’s user supplied, even though it looks official. Hey, look, I just got into Stanford too! My new degree in astrophysics is going to rock.

Screenshot of a fake degree I just gave myself on LinkedIn. I deleted it immediately after; fake-attending Stanford was messing up my work-life balance.

The connections are a bit more interesting. One person called one of Maisy’s endorsements to see if they actually knew this person. Nope, they didn’t. Just doing that thing where you don’t refuse connections or mutual endorsements. “Maisy” just had to request a lot of connections and make enough endorsements and figure that enough of a percentage would follow or endorse her back. Done and done.

“JB is real…talked on the phone…just taking advantage of reciprocal nature of people” Twitter, @plainsite

I’ll tell you a funny story, completely true. I once friended someone on LinkedIn that I knew, Sara Wickham or somesuch. And we went back and forth talking about our friends in college in 1993 — “Remember Russ?” “Guy with the guitar, always playing the Dead and Camper Van Beethoven?” “Oh you mean Chris?” “Right, Chris.” “Absolutely. Whatever happened to Chris, anyway?”

A week or so into our back and forth I noticed we had never attended the college at the same time, and as I dug into it I remembered the person I was thinking of didn’t have that last name at all. I had never met this person I was talking with, and in fact we had no common friends.

That’s LinkedIn in a nutshell. Connections are a worthless indicator.

Stop with the “Aha, I Spotted Something” You’re Firing Up In Your Head

So now maybe you’re channeling your inner Columbo and just dying to tell us all about all the things you’ve noticed that “gave this away”. You would not have been fooled, right?

I mean, there’s a five year work gap between Stanford and reporting. She graduated in 2013 and then started just *now*. Weird, right? There’s a sort of bulge in the photo that’s the tell-tale sign of AI processing! And it’s the same photo everywhere! The bio on the website sucks. The name of her freelancing outfit is Unbiased Storytellers, which feels as made up a name as Honest Auto Service.

Here’s the thing — you’re less smart if you’re doing this stuff than if you’re not. You know the person is fake, and so what you’re doing is noticing all the little inconsistencies.

But the problem is that life is frustratingly inconsistent once put under a microscope. The work gap? People have kids, man. It’s not unusual at all — especially for women — to have a work gap between college and their first job. If that’s your sign of fakery, you’re going to be labeling a lot of good female reporters fake.

That photo? Sometimes photos just have weirdness about them. Here’s the photo of a Joshua Benton on Twitter, who tweets a lot of stuff.

Joshua Benton
Joshua Benton

Joshua’s Twitter bio claims that he’s a person running a journalism project at Harvard, so it’s a bit weird he’s obscuring what he looks like. Definitely fishy!

Except, of course, Benton does work at Harvard, and in fact runs a world famous lab there.

What about Maisy’s sucky bio? Well, have you ever written a sucky bio and thought, I’ll go back and fix that? I have. (A lot of them made it all the way into conference programs).

And finally the name of her freelance shop: Unbiased Freelancing and Storytellers. Surely a fake, right?

Funny story about that. A bunch of Twitter users were investigating this story and looking at her LinkedIn connections/endorsements. And one of them found the clearly fake Mr. Shepard, a “Dog Photographer and Maker of Paper Hats”:

Do I have to spell this out for you? His name is Shepard and he photographs dogs. Look at the hats, which are CLEARLY photoshopped on (can you see the stitching?) His bio begins “Walking the line between storyteller and animal handler…” Come ON, right?

Except then the same person called the “Puptrait” studio. And JB is real. And his last name is really Shepard.

This puppy is real and is really wearing that hat. They is also super adorable, and if you want a picture like this of your own pet (or just want to browse cuteness for a while in a world that has gotten you down) you should check out Shepard’s Puptrait Studio. Picture used with kind permission of JB Shepard.

And the hats aren’t photoshopped, he really does make these custom paper hats that fit on dog’s heads.

If you’d think this was fake, don’t blame yourself — when reading weak and cheap signals you are at the mercy of your own biases as to what is real and what is not, what is plausible and what is not. You’ll make assumptions about what a normal work gap looks like based on being a man, or what a normal job looks like based on being something that isn’t a dog photographer and maker of paper hats.

I actually used to do this thing where I would tell faculty or students that a site was fake, and ask them how do we know? And they would immediately generate *dozens* of *obvious* tells — too many ads, weird layout, odd domain name, no photos of the reporters, clickbaity headlines, no clear about page. And then I would reveal that it was actually a real site. And not only a real site, but a world renowned medical journal or Australia’s paper of national record.

I had to stop doing this for a couple reasons. First, people got really mad at me. Which, fair point. It was a bit of a dick move.

But the main reason I had to stop is after having talked themselves into it by all these things they noticed, a certain number of the students and faculty could not be talked out of it, even after the reveal. Each thing they had “noticed” had pulled them deeper into the belief that the site was faked and being told that it was actually a well-respected source created a cognitive dissonance that couldn’t be overcome. People would argue with me. That can’t really be a prestigious medical journal — I don’t believe you! You’ve probably got it wrong somehow! Double-check!

It ended up taking up too much class time and I moved on to other ways to teach it. But the experience actually frightened me a bit.

Avoid Cheap Signals, Look For Well-Chosen Signs.

By looking at a lot of poor quality cheap signals you don’t get a good sense of what a person’s reputation is. Mostly, you just get confused. And the more attributes of a thing you have to weigh against one another in your head the more confused you’re going to get.

This situation is only going to get worse, of course. Right now AI-generated pictures do have some standard tells, but in a couple years they won’t. Right now you still have to write your own marketing blurb on a website, but in a couple years machine learning will pump out marketing prose pretty reliably, and maybe at a level that looks more “real” than stuff hand-crafted. The signals are not only cheap, they are in a massive deflationary spiral.

What we are trying to do in our digital literacy work is to get teachers to stop teaching this “gather as much information as you can and weigh it in a complex calculus heavily influenced by your own presuppositions” approach and instead use the network properties of the web to apply quick heuristics.

Let’s go back to this “reporter”. She claims to write for Bloomberg.

Snapshot of Maisy Kinsley’s profile

Does she? Has she written anywhere? Here’s my check:

Screenshot of Google News

I plug “Maisy Kinsley” into Google News. There’s no Maisy Kinsley mentioned at all. Not in a byline, not in a story. You can search too and there’s nothing there at all.

Let’s do the same with a reporter from the BBC who just contacted me. Here’s a Google News Search. First a bunch of Forbes stuff:

A search for Frey Lindsay turns up many stories from Forbes in Google News

Downpage some other stuff including a BBC reference:

If we click through to BBC News and do a search, we find a bunch more stories:

We’re not looking at hair, or photos, or personal websites or LinkedIn pages or figuring out if a company name is plausible or a work gap explainable. All those are cheap signals that Frey can easily fake (if a bad actor) and we can misread (if he is not). Instead we figure out what traces we should find on the web if is Frey really a journalist. Not what does Frey say about himself, but what does the web say about Frey. The truth is indeed “out there”: it’s on the network, and what students need is to understand how to apply network heuristics to get to it. That involves knowing what is a cheap signal (LinkedIn followers, about pages, photographs), and what is a solid sign that is hard to counterfeit (stories on an authoritative and recognizable controlled domain).

Advancing this digital literacy work is hard because many of the heuristics people rely on in the physical world are at best absent from the digital world and at worst easily counterfeited. And knowing what is trustworthy as a sign on the web and what is not is, unfortunately, uniquely digital knowledge. You need to know how Google News is curated and what inclusion in those results means and doesn’t mean. You need to know followers can be bought, and that blue checkmarks mean you are who you say you are but not that you tell the truth. You need to know that it is usually harder to forge a long history than it is to forge a large social footprint, and that bad actors can fool you into using search terms that bring their stuff to the top of search results.

We’ve often convinced ourselves in higher education that there is something called “critical thinking” which is some magical mental ingredient that travels, frictionless, into any domain. There are mental patterns that are generally applicable, true. But so much of what we actually do is read signs, and those signs are domain specific. They need to be taught. Years into this digital literacy adventure, that’s still my radical proposal: that we should teach students how to read the web explicitly, using the affordances of the network.

If you want to see how badly we are failing to teach students these things, check out A Brief History of CRAAP and Recognition is Futile.

Update on Check, Please!

Short update on the Check, Please project.

We’re about halfway into the coding hours on this which is a bit scary. We still have some expert hours from TS Waterman at the end to solve the hard problems but right now we’re solving the easy ones.

A couple weeks ago we put out a prototype. The prototype was for one of the three moves we wanted to showcase, and it was functional, and used the original concept of a headless Chrome instance in the background to make these things. The protoype did what good prototypes do and showed that project was possible, but there were three weak spots:

  • First, the Chrome screenshots could usually be manipulated to capture the right part of the screen (e.g. scroll down to a headline or get the correct Google result scrolled into view). But this was a bit more fragile than hoped as we tested it on a wide array of prompts.
  • Second, headless chrome was really slow on some sites. Even on speedy sites, like Google, the fire-up and retrieval would normally be a couple seconds but could stretch to much much more. We were headless chroming three sites and on the occasional call where all three went slow we’d sometimes get over 30 seconds. This didn’t happen a lot (timings were usually about 10 – 15 seconds for the entire process) but it happened enough.
  • Finally, because headless chrome is headless a lot of things needed to make the animation instructive (mouse pointers, cursors, omnibars) have to be added anyway via image manipulation.

I played with the settings, with asynchrony, with using a single persistent instance of Chrome Driver, and things got better, but it became clear that we should offload at least some problems to a caching mechanism, and where possible use PIL to draw mockups off of an HTML request rather than doing everything through screenshots. So I’m in the middle of that rebuild now, with caching and some imaging library rework. Hoping to get it reliably under 10 seconds max.

Web Literacy Across the Curriculum

We’re still teaching history using only print texts even as kids are being historicized online by Holocaust deniers and Lost-Causers. We’re teaching science in an era when online anti-vaxxers gain traction by using scientific language to deceive and intimidate. 

Sam Wineburg, The internet is sowing mass confusion. We must rethink how we teach kids every subject.

Couple good pieces out — one by Sam Wineburg, and an interesting response (expansion?) by Larry Cuban. The point, at least as I read it? Misinformation on the web is not really a subject — or, in any case, not only a subject. The web, after all, is an environment, a domain in which most professional, scholarly, and civic skills are practiced. Yet the structure of how we teach most subjects treats the web as either an afterthought, or worse, as a forbidden land.

If you know me and know this blog, this issue has been my obsession since before this blog was launched in 2007. Back in 2009 I dubbed the practice of ignoring the web as a target domain as “Abstinence-only Web Education“:

…what [the term] expresses [is] my utter shock that when talking to some otherwise intelligent adults about the fact that we are not educating our students to be critical consumers of web content, or to use networks to solve problems, etc — my utter shock that often as not the response to this problem is “Well, if students would just stop getting information from the web and go back to books, this whole problem would go away.”

Now I do believe that reading more books and less web is usually a good decision as part of a broader strategy. But most of what students will do in their professional and civic lives will involve the web.

My younger daughter, for example, is presenting to the school board tonight about how the integrated arts and academics magnet program she is in supports various educational objectives. When trying to understand what those objectives mean — from critical thinking to collaboration — she is not reading a textbook or going to a library. She is consulting the web.

And I am writing this at work as part of being in an informal professional development community, and you are reading it to maybe help you with your job.

These issues seem a million miles away from Pizzagate and blogs that tell you that sea ice is increasing and climate change is really a hoax. But they turn out to be adjacent. What happens if my daughter’s search for critical thinking lands on one of the recently politicized redefinitions of that term, which she ends up presenting to the school board? And you’re here at this blog, trusting me — but there are of course other blogs and articles that are written by people in the employ of ed tech firms, and those by people that have zero experience in the domain on which they write. Giving your attention to those sites may actually make you worse at what you do, or lead to your manipulation by corporate forces of which you are unaware.

Or maybe not! Maybe you’re good at all this.

Still, I keep coming back to that part of Dewey’s School and Society where he talks about the problem of transmission of knowledge in a post-agrarian society. In the first lecture in that work, Dewey talks about the way in which industrialization has rendered the processes of production opaque. In an agrarian society, he notes, “the entire industrial process stood revealed, from the production on the farm of the raw materials, till the finished article was actually put to use.” In such a world a youngster could simply observe, and see what competent practice looked like. To understand where things came from was to understand one’s household, and not much pedagogical artifice was required. With the introduction of complex, specialized and opaque systems, however, there was no opportunity to learn by looking over a parents shoulder, and so a more designed approach was required.

Two things occur to me re-reading that. The first is not necessarily a new media literacy insight. But that networked opacity we deal with — the complex network of actors and algorithms that lead to a piece of information or propaganda being displayed on your screen — is a very similar problem. There’s a part of that lecture where Dewey talks about how students that investigate the production of clothing walk through domains of physics, history, geography, engineering, and economics due to the complex set of historical, geographical, and other factors that have determined the way in which clothing gets made. The point he makes is that you can organize the curriculum around clothing, and the disciplines become meaningful.

I’m not proposing to do a complete retread of Dewey’s progressive education in 2019. We’ve learned a lot since Dewey about how people learn; that’s good and we should use that. But narrowly, what Dewey saw in clothing in 1899 I see in web literacy today. Here is a going social concern that combines sociology, psychology, history, engineering, algorithms, math, political science and so on. You don’t have to adopt unmodified Deweyism to see the opportunities there for integrative education. Elucidate the circumstances of production for this thing students are using most of their waking life. If you’re a high school or an integrative first-year program put together a year on it, and try it out.

The second point is on skills. Dewey noted that when professional knowledge moved out of farms and into factories and offices children lost the ability to observe competence in action. Work — and the skills associated with it — became hidden.

That’s still true today, but there’s another angle on this. Even in offices our skills are quite hidden because of the ways that this work evades third-party observation. Where there is an artifact of work — equations, code, writing, etc., a co-worker can ask “hey, why are you doing that in that way?” And where more ephemeral processes are public — soft skills exercised in a meeting for example — they can also be learned.

But web skills have the double whammy of leaving very little trace, and of being intensely private. And this makes transmission and improvement of these skills much more difficult, and creates a situation where there is a lot of hidden need. More on that in a later post.

Educating the Influencers: The “Check, Please!” Prototype

OK, maybe you’re just here for the video. I would be. Watch the demo of Check Please, and then continue downpage for the theory of change behind it.

Watched it? OK, here’s the backstory.

Last November we won an award from RTI International and the Rita Allen Foundation to work on a “fact-checking tutorial generator” that would generate hyper-specific tutorials that could be shared with “friends, family, and the occasional celebrity.” The idea was this — we talk a lot about media literacy, but the people spreading the most misinformation (and the people reaching the most people with that misinformation) are some of the least likely people to currently be in school. How do we reach them?

I proposed a model: we teach the students that we have, and then give them web tools to teach the influencers. As an example, we have a technique we show students called “Just add Wikipedia”: when confronted with a site of unknown provenance, go up to the omnibar, add “wikipedia” after the domain to trigger a Google search that floats relevant Wikipedia pages to the top, select the most relevant Wikipedia page, and get a bit of background on the site before sharing.

When teaching students how to do this, I record little demos using Camtasia on a wide variety of examples. Students have to see the steps and, as importantly, see how easy the steps really are, on a variety of examples. And in particular, they have to see the steps on the particular problem they just tried to solve: even though the steps are very standard, general instruction videos don’t have half the impact of specific ones. When you see the exact problem you just struggled with solved in a couple clicks, it sinks in in a way that no generic video ever will.

Unfortunately , this leaves us in a bit of a quandary relative to our “have students teach the influencers” plan. I have a $200 dollar copy of Camtasia, a decades worth of experience creating screencasts, and still, for me to demo a move — from firing up the screen recorder to uploading to YouTube or exporting a GIF — is a half-hour process. I doubt we’re going to change the world on that ROI. As someone once said, a lie can make it halfway around the world while the truth is still lacing up its Camtasia dependencies.

But what if we could give our students a website that took some basic information about decisions they made in their own fact-checking process and that website would generate the custom, shareable tutorial for them to share, as long as they were following one of our standard techniques?

I came up with this idea last year — using selenium, a invisible Chrome browser you can run on the server — to walk through the steps of a claim or reputation check while taking screen shots that formed the basis of an automatic tutorial on fact-checking a claim. And I ran it by TS Waterman and after walking through it a bit we decided that — maybe to our surprise (!!) — it seemed rather straightforward. We proposed it to the forum, won the runner-up prize in November, and on January 15 I began work on it. (TS is still involved and will help optimize the program and advise direction as we move forward, as soon as I clean up my embarrassing prototype spaghetti code).

But here’s the thing — it works! The prototype is so so far from finished, and the plan is to launch a public site in April after adding a couple more types of checks and massively refactoring code. But it works. And it may provide a new way to think about stopping the spread of misinformation, not by by generic tools for readers, but by empowering those that enforce social norms with better, more educational tools.

The result.

Attention Is the Scarcity

There’s a lot of things that set our approach at the Digital Polarization Initiative apart from most previous initiatives. But the biggest thing is this: we start from the environment in which students are most likely to practice online literacy skills, and in that environment attention is the scarcity.

The idea that scarce attention forms the basis of modern information environments is not new. Herbert Simon, years ago, noted that abundances consume — an abundance of goats makes a scarcity of grass. And information? It consumes attention. So while we name this the information age, information is actually less and less valuable. The paradox of the information age is that control of information means less and less, because information becomes commodified. Instead, the powerful in the information age control the scarcity: they control attention.

Slide from my presentation at StratCom last year

Again, this is not an observation that is unique to me. Zeynep Tufecki, Penny Andrews, An Xaio Mina, Claire Wardle, Whitney Phillips, and so many more have drilled down on various aspects of this phenomenon. And years ago, Howard Rheingold put attention as a crucial literacy of the networked age, next to others like critical consumption. It’s not, at this point, a very contentious assertion.

And yet the implications of this, media literacy at least, have yet to be fully explored. When information is scarce, we must deeply interrogate the limited information that is provided us, trying to find the internal inconsistencies, the flaws, the contradictions. But in a world where information is abundant, these skills are not primary. The primary skill of a person in an attention-scarce environment is making relatively quick decisions about what to turn their attention toward, and making longer term decisions about how to construct their media environment to provide trustworthy information.

People know my four moves approach that tries to provide a quick guide for sorting through information, the 30 second fact-checks, and the work from Sam Wineburg and others that it builds on. These are media literacy, but they are focused not on deeply analyzing a piece of information but on making a decision of whether an article, author, website, organization, or Facebook page is worthy of your attention (and if so, with what caveats).

But there are other things to consider as well. When you know how attention is captured by hacking algorithms and human behavior, extra care in deciding who to follow, what to click on, and what to share is warranted. I’ve talked before about PewDiepie’s recommendation of an anti-Semitic YouTube account based on some anime analysis he had enjoyed. Many subscribed based on the recommendation. But of course, the subscription doesn’t just result in that account’s anime analysis videos being shared with you — it pushes the political stuff to you as well. And since algorithms weight subscriptions highly in what to recommend to you, it begins a process of pushing more and more dubious and potentially hateful content in front of you.

How do you focus your attention? How do you protect it? How do you apply it productively and strategically, and avoid giving it to bad actors or dubious sources? And how do you do that in a world where decisions about what to engage with are made in seconds, not minutes or hours?

These are the question our age of attention requires we answer, and the associated skills and understandings are where we need to focus our pedagogical efforts.

The Fyre Festival and the Trumpet of Amplification

Unless you’ve been living under a rock, you’re probably aware that there are two documentaries out on the doomed Fyre Festival. You should watch both: the event — both its dynamics and the personalities associated with it — will give you disturbing insights into our current moment. And if you teach students about disinformation I’d go so far as to assign one or both of the documentaries.

Here is one connection between the events depicted in the film and disinfo. There are many others. (This post is not intended for researchers of disinfo, but for teachers looking to help students understand some of the mechanisms).

The Orange Square

Key to the Fyre Festival story is the orange square, a bit of paid coordinated posting by a set of supermodels and other influencers. The models and influencers, including such folks as Kendall Jenner, were paid hundreds of thousands of dollars to post the same message with a mysterious orange square on the same day. And thus an event was born.

Related image

People new to disinformation and influencer marketing might think the primary idea here is to reach all the influencer followers. And that’s part of it. But of course, if that were the case you wouldn’t need to have people all post at the same time. You wouldn’t need the “visual disruption” of the orange square.

The point here is not to reach followers, but to catalyze a much larger reaction. That reaction, in part, is media stories like this by the Los Angeles Times.

And of course it wasn’t just the LA Times: it was dozens (hundreds?) of blogs and publications. It was YouTubers talking about it. Music bloggers. Mid-level elites. Other influencers wanting in on the buzz. The coordinated event also gave credibility required to book bands, the booking of the bands created more credibility, more news pegs, and so on.

You can think of this as a sort of nuclear reaction. In the middle of the event sits some fissile material — the media, conspiracy thought leaders, dispossessed or bitter political influencers. Around it are laid synchronized charges that, should they go off right, catalyze a larger, more enduring reaction. If you do it right, a small amount of social media TNT can create an impact several orders of magnitude larger than its input.

Enter the Trumpet

Central to understanding this is the fissile material is not the general public, at least at first. As a marketer or disinfo agent you often work your way upward to get downward effects. Claire Wardle, drawing on the work of Whitney Phillips and others, expresses one version of this in the “trumpet of amplification“:

Image result for "claire wardle" trumpet

Here the trumpet reflects a less direct strategy than Fyre, starting by influencing smaller, less influential communities, refining messages then pushing them up the influence ladder. But many of the principles are the same. With a relatively small number of resources applied in a focused, time-compressed pattern you can jump start a larger and more enduring reaction that gives the appearance of legitimacy — and may even be self-sustaining once manipulation stops. Maybe that appearance of legitimacy is applied to getting investors and festival attendees to part with their money. Or maybe it’s to create the appearance that there’s a “debate” about whether the humanitarian White Helmets are actually secret CIA assets:

Maybe the goal is disorientation. Maybe it’s buzz. Maybe it’s information — these techniques, of course, are also often used ethically by activists looking to call attention to a certain issue.

Why does this work? Well, part of it is the nature of the network. In theory the network aggregates the likes, dislikes and interests of billions of individuals and if some of those interests begin to align — shock at a recent news story for example — then that story breaks through the noise and gets noticed. When this happens without coordination it’s often referred to as “organic” activity.

The dream of many early on was that such organic activity would help us discover things we might otherwise not. And it has absolutely done that — from Charlie Bit My Finger to tsunami live feeds this sort of setup proved good at pushing certain types of content in front of us. And it worked in roughly this same sort of way — organic activity catches the eyes of influencers who then spread it more broadly. People get the perfect viral dance video, learn of a recent earthquake, discover a new opinion piece that everyone is talking about.

But there are plenty of ways that marketers, activists, and propagandists can game this. Fyre used paid coordinated activity, but of course activists often use unpaid coordinated activity to push issues in front of people. They try to catch the attention of mid-level elites that get it in front of reporters and so on. Marketers often just pay the influencers. Bad actors seed hyperpartisan or conspiracy-minded content in smaller communities, ping it around with bots and loyal foot soldiers, and build enough momentum around it that it escapes that community. giving the appearance to reporters and others of an emerging trend or critique.

We tend to think of the activists as different from the marketers and the marketers as different from the bad actors but there’s really no clear line. The disturbing fact is it takes frightfully little coordinated action to catalyze these larger social reactions. And while it’s comforting to think that the flaw here is with the masses, collectively producing bizarre and delusional results, the weakness of the system more likely lie with a much smaller set of influencers, who can be specifically targeted, infiltrated, duped, or just plain bought.

Thinking about disinfo, attention, and influence in this way — not as mass delusion but as the hacking of specific parts of an attention and influence system — can give us better insight into how realities are spun up from nothing and ultimately help us find better, more targeted solutions. And for influencers — even those mid-level folks with ten to fifty thousand followers — it can help them come to terms with their crucial impact on the system, and understand the responsibilities that come with that.

Smoking out the Washington Post imposter in a dozen seconds or less

So today a group known for pranks circulated an imposter site that posed as the Washington Post, announcing President Trump’s resignation on a post-dated paper. It’s not that hard for hoaxers to do this – any one can come up with a confusingly similar url to a popular site, grab some HTML and make a fake site. These sites often have a short lifespan once they go viral — the media properties they are posing as lean on the hosters who pull the plug. But once it goes viral the damage is done, right?

It’s worth noting that you don’t need a deep understanding of the press or communications theory to avoid being duped here. You don’t even need to be a careful reader. Our two methods for dealing with this are dirt simple:

  • Just add Wikipedia (our omnibar hack to investigate a source)
  • Google News Search & Scan (our technique we apply to stories that should have significant coverage).

You can use either of these for this issue. The way we look for an imposter using Wikipedia is this:

  1. Go up to the “omnibar” and turn the url into a search by adding space + wikipedia
  2. Click through to the article on the publication you are supposedly looking at.
  3. Scroll to the part of the sidebar with a link to the site, click it.
  4. See if the site it brings you to is the same site

Here’s what that looks like in GIF form (sorry for the big download).

I haven’t sped that up, btw. That’s your answer in 12 seconds.

Now some people might say, well if you read the date of the paper you’d know. Or if you knew the fonts associated with the Washington Post you’d realize the fonts were off. But none of these are broadly applicable habits. Every time you look at a paper like this there will be a multitude of signals that argue for the authenticity of the paper and a bunch that argue against it. And hopefully you pick up on the former for things that are real and the latter for things that aren’t, but if you want to be quick, decisive, and habitual about it you should use broadly applicable measures that give you clear answers (when clear answers are available) and mixed signals only when the question is actually complex.

When I present these problems to students or faculty I find that people can *always* find what they “should have” noticed after the fact. But of course it’s different every time and it’s never conclusive. What if the fonts had been accurate? Does that mean it’s really the Post? What if the date was right? Trustworthy then?

The key isn’t figuring out the things that don’t match after the fact. The key is knowing the most reliable way to solve the whole class of problem, no matter what the imposter got right or wrong. And ideally you ask questions where a positive answer has a chance of being as meaningful as a negative one.

Anyway, the other route to checking this is just as easy — our check other coverage method, using a Google News Search:

  1. Go to the omnibar, search [trump resigns]
  2. When you get to the Google results, don’t stop. Click into Google News for a more curated search
  3. Note that in this case there are zero stories about Trump resigning and quite a lot about the hoax.
  4. There is no step four — you’re done

Again, here it is in all it’s GIF majesty:

You’ll notice that you do need to practice a bit of care here — some publishers try to clickbait the headline by putting the resignation first, hoping that the fact it was fake gets trimmed off and gets a click. (If I were king of the world I’d have a three strikes policy for this sort of stuff and push repeat offenders out of the cluster feature spots, but that’s just me). Still, scanning over these headlines even in the most careless way possible it would be very hard not to pick up this was a fake story.

Note that in this case we don’t even need these fact-checks to exist. If we get to this page and there are no stories about Trump resigning, then it didn’t happen — for two reasons. First, if it happened there would be broad coverage. Second, even if the WaPo was the first story on this, we would see their story in the search results.

There’s lots of things we can teach students, and we should teach them them. But I’m always amazed that two years into this we haven’t even taught them techniques as simple as this.

Why Reputation?

As I was reading An Xiao Mina’s recent (and excellent) piece for Nieman Lab, and it reminded me that I had not yet written here about why I’ve increasingly been talking about reputation as a core part of online digital literacy. Trust, yes, consensus, yes. But I keep coming back to this idea of reputation.

Why? Well, the short answer is Gloria Origgi. Her book, Reputation, is too techno-optimist in parts, but is still easily the most influential book I’ve read in the past year. Core to Origgi’s work is the idea that reputation is both a social relation and a social heuristic, and these two aspects of reputation have a dynamic relationship. I have a reputation, which is the trace of past events and current relationships in a social system. But that reputation isn’t really separate from the techniques others use to decode and utilize my reputation for decision-making.

This relationship is synergistic. As an example, reputation is subject to the Matthew Effect, where a person who is initially perceived as smart can gain additional reputation for brilliance at a fraction of the cost of someone initially perceived as mediocre. This is because quick assessments of intelligence will have to weight past assessments of others — as a person expands their social circle initial judgments are often carried forward, even if those initial judgments are flawed.

Reputation as a social heuristic maps well onto our methods of course — both Origgi and the Digital Polarization initiative look to models from Simon and Gigerenzer for inspiration. But it also suggests a theory of change.

Compare the idea of “trust” to that of “reputation”. Trust is an end result. You want to measure it. You want to look for and address the things that are reducing trust. And, as I’ve argued, media literacy programs should be assessing shifts in trust, seeing if students move out of “trust compression” (where everything is moderately untrustworthy) to a place where they make bigger and more accurate distinctions.

But trust is not what is read, and when we look at low-trust populations it can often seem like there is not much for media literacy to do. People don’t trust others because they’ve been wronged. Etc. What exactly does that have to do with literacy?

But that’s not the whole story, obviously. In between past experience, tribalism, culture, and the maintenance of trust is a process of reading reputation and making use of it. And what we find is that, time and time again, bad heuristics accelerate and amplify bad underlying issues.

I’ve used the example of PewDiepie and his inadvertent promotion of a Nazi-friendly site as an example of this before. PewDiepie certainly has issues, and seems to share a cultural space that has more in common with /pol/ than #resist. But one imagines that he did not want to risk millions of dollars to promote a random analysis of Death Note by a person posting Hitler speeches. And yet, through an error in reading reputation, he did. Just as the Matthew Effect compounds initial errors in judgment when heuristics are injudiciously applied, errors in applying reputation heuristics tend to make bad situations worse — his judgment about an alt-right YouTuber flows to his followers who then attach some of PewDiepie’s reputation to the ideas presented therein — based, mostly, on his mistake.

I could write all day on this, but maybe one more example. There’s an old heuristic about the reputation of positions on issues — “in matters indifferent, side with the majority.” This can be modified in a number of ways — you might want to side with the qualified majority when it comes to treating your prostate cancer. You might side with the majority of people who share your values on an issue around justice. You might side with a majority of people like you on an issue that has some personal aspects — say, what laptop to get or job to take. Or you might choose a hybrid approach — if you are a woman considering a mastectomy you might do well to consider what the majority of qualified women say about the necessity of the procedure.

The problem, however, from a heuristic standpoint, is that it is far easier to signal (and read the signal) of attributes like values or culture or identity than it is to read qualifications — and one underremarked aspect of polarization is that — relative to other signals — partisan identity has become far easier to read than it was 20 years ago, and expertise has become more difficult in some ways.

One reaction to this is to say — well people have become more partisan. And that’s true! But a compounding factor is that as reputational signals around partisan identity have become more salient and reputational signals around expertise have become more muddled (by astroturfing, CNN punditocracy, etc) people have gravitated to weighting the salient signals more heavily. Stuff that is easier to read is quicker to use. And so you have something like the Matthew Effect — people become more partisan, which makes those signals more salient, which pushes more people to use those signals, which makes people more partisan about an expanding array of issues. What’s the Republican position on cat litter? In 2019, we’ll probably find out. And so on.

If you want to break that cycle, you need to make expertise more salient relative to partisan signals, and show people techniques to read expertise as quickly as partisan identity. Better heuristics and an information environment that empowers quick assessment of things like expertise and agenda can help people to build better, fuller, and more self aware models of reputation, and this, in turn, can have meaningful impact on the underlying issues.

Well, this has not turned into the short post I had hoped, and to do it right I’d probably want to talk ten more pages. But one New Year’s resolution was to publish more WordPress drafts, so here you go. 🙂

Some Notes On Installing Federated Wiki On Windows

It’s 2018, and I’ve still not found anything that helps me think as clearly as federated wiki. At the same time, running a web server of your own is still, in 2018, a royal pain. Case in point: recently a series of credit card breaches forced a series of changes in my credit card number (two breaches in one year, hooray). And that ended up wiping out my Digital Ocean account as it silently failed the monthly renewals. Personal cyberinfrastructure is a drag, man.

But such is life. So I recently started looking at whether I could do federated wiki just on my laptop and not deal with a remote server. It doesn’t get me into the federation, per se, but it allows all the other benefits of federated wiki — drag-and-drop refactoring, quick idea linking, iterative note-taking, true hypertext thinking.

It turns out to be really easy (I mean as things go with this stuff). I’ll go into detail more below, but here are the steps:

  1. Download Node.js for Windows. Install.
  2. Open a command window and type: npm install -g wiki
  3. Launch via command window: wiki -p 80 –security_type friends –cookieSecret ‘REPLACE-THIS-SECRET’
  4. Navigate to localhost in a browser
  5. Click the lock to “claim” the wiki as owner
  6. Click the “wiki” link to take it out of read-only mode.
  7. Go forth and wiki…..


Step one: Download Node.js for Windows. Install.

Step two: Open a command window and type: npm install -g wiki

It’s installed!

Initial Startup

To start your wiki go to a command prompt and type:

wiki -p 80 --security_type friends --cookieSecret 'REPLACE-THIS-SECRET' 

You may need to give node some permissions. I won’t advise you on that. But you definitely don’t need to give public networks access to your server if you don’t want.

Go to your localhost. You’ll get the start page.

Claiming your wiki

When you first visit your wiki it will be in unclaimed, read-only mode, and the bottom of the interface will look like this (though probably not have 47 pages):

When you click that lock icon, it will create a random username and go to unlocked position.

Once you do that you can click on the word “wiki” and now it will move out of read-only into edit mode:

You’ll know it’s in edit mode because you’ll see the edit history icons (sometimes colloquially referred to as ‘chiclets’) at the bottom.

And — that’s it. You’re done. Wiki away.

You’ll need to launch the server from a command window each time you want to use it, but if you’re familiar with Windows you can write a bat file and put it in your startup folder.

(Incidentally, this isn’t a tutorial on how to use federated wiki. I’m tired, frankly, of trying to sell it to people who want to know why it takes more than fifteen minutes to learn. I don’t teach people it anymore because people have weird expectations and it wastes too much of my time trying to get past them. But if you’re one of the people who has made the jump, you know this — I just want to help you do it locally on your laptop.)

Optional stuff: Changing your name, importing or backing up files

You don’t need to know where files live on your computer, but sometimes it is useful. For instance, you might want to back up your pages, or reset a username. Here’s how you can do that.

In the single user mode we used above, wiki pages will be in a .wiki directory under your user directory. For instance, my directory is C:\Users\mcaulfield\.wiki\pages. They are simple json files, and can be backed up and zipped. You can also drop json files from other wiki instances here, though you’ll have to delete the sitemap.json file to reindex (more on that below).

For ownership and indexing issues there is a status directory under the .wiki directory (e.g. C:\Users\mcaulfield\.wiki\status). This has two important files in it. One is owner.json, which maintains login information (initially this will not be there — it’s written when you claim it). The other is your sitemap, which has a list of all pages and recent updates on them. Deleting the sitemap is useful when you want to regenerate it after manually uploading new files.

To change your username, you can edit the owner.json file. Change the name property.

If something goes wrong and you want to reinitiate the claim process, you can delete the owner.json file.

If you clear your cookies and hence loose your claim (i.e. are logged out), you can pull the secret from the json and enter it when prompted. It’s OK to change it to something simple and more password-like that you can remember.

The node files of your wiki installation will be in your AppData roaming directory under npm, e.g. C:\Users\mcaulfield\AppData\Roaming\npm\node_modules\wiki. There’s not an real reason to touch these files.

Running a personal desktop farm

This is only for federated wiki geeks, but it is completely possible to run a small personal desktop farm, where you can run multiple wiki sites in what are essentially separate notebooks. Just go into your hosts file (C:\Windows\System32\drivers\etc\hosts) and add localhost aliases:

# localhost name resolution is handled within DNS itself.
#       localhost
#	::1             localhost       disinfo       journal       papersonwiki	sandbox	teachersguide	wikipediawomen	opioidcrisis	raceinamerica

Launch in farm mode (-f) and type these words into your browser omnibar. Each will maintain a separate wiki instance. If you want to be able to search across all instances, use the –autoseed flag. Note that you’ll have to go through the minimal claim process with each one (two clicks, shown above).

Pushing to a larger federation

If you want to push to a remote server, you can. There’s a couple ways to do this.

First, there’s a flag in wiki that allows you to point to a different directory for pages. So you can point that to a mapped drive or Dropbox or whatever on your laptop, and then point a remote server to that same directory.

Alternatively you could do a periodic rsync to the server. Windows 10 has bash native to it, so you can install that, reach your files through Bash for Windows’s /mnt/c/ mapping, and push them up that way.

In each case, you probably want to delete the sitemap.json and sitemap.xml to trigger a regeneration.

Interestingly, you could also use this scheme (I think) for joint generation of a public wiki.

IIRC, there is also a way to drag and drop json export files into wiki instances.

Finally, you can share files with people by zipping them up and emailing them or providing them as a zipped download. They in turn can drop them into their own federated wiki instance to work with. I’ve been thinking a lot about this model, which is very memex like — I make a notebook of my notes on something, post it up, you pull it into your machine. The provenance gets messy at scale, but among a group of people in a subfield that are being more descriptive in their practice than rhetorical this might work out fine.

It’s Good To Be Back

Using federated wiki again reminds me once again of what wiki means, in an etymological sense. It means quick.

What all other non-federated wiki systems lack is not just federation. They lack quickness, largely because they are designed towards novices and trade away possibilities for fluid authorship in exchange for making the first 15 minutes of use easier.

So while it may seem weird to run a federated wiki server on a laptop in a way that makes federation less available, if you’ve learned the method of multi-pane wiki it’s not really weird at all, because every note taking system you’ve used besides federated wiki is unbearably slow, clunky, and burdensome. Federated wiki, in the hands of someone that has mastered it, works at the speed of thought. And it does that whether your in the federation or not. So here’s to a very wiki New Year.