There’s a video going around that purportedly shows Nancy Pelosi drunk or unwell, answering a question about Trump in a slow and slurred way. It turns out that it is slowed down, and that the original video shows her quite engaged and articulate.
Two things about this. The first is that our four moves (SIFT) apply well to this incident. Specifically, the “T” in SIFT is “Trace quotes, claims, and media to the original context.” In this case you can watch the original video on C-SPAN and see the difference immediately.
But what if you can’t trace it? In general, if the provenance of the video is hidden, but clearly has an unlinked original source, wait a bit. Even decent news sources can be godawful at linking original sources, but usually for a big video like this people will point you to the original within a day or two which is what happened here.
The second thing to watch is how the media ecosystem works as, well, a system. When people look at the impact of false news they often measure how much of it makes it to mainstream broadcasts. But very often the way networked lies and mainstream news interact is synergistic. So as this false Facebook video is being circulated to millions of viewers, the Fox News show Lou Dobbs Tonight airs a different video of Pelosi with some instances of her stammering edited together and asks “What’s going on?” Age? Illness? The video pushes beyond the bounds of acceptable journalism, but within the bounds of what is currently permissible on air. The guest commentator is very muted but pointed in replies — she’s getting old, probably pushing herself too hard, maybe needs to step aside.
In musical production there is a technique called double-tracking, and it’s not a perfect metaphor for what’s going on here but it’s instructive. In double tracking you record one part — a vocal or solo — and then you record that part again, with slight variations in timing and tone. Because the two tracks are close, they are perceived as a single track. Because they are different though, the track is “widened” feeling deeper, richer. The trick is for them to be different enough that it widens the track but similar enough that they blend.
Anyway, that’s what you see with a lot of disinfo campaigns. On the wild west of social media, outright lies are spread. And usually the outright lies don’t make it to the mainstream outlets exactly as spread, but a very similar and dishonestly spun story is spread at the same time through broadcast. The two blend into one, able to use to freedom of the web to build shock and the amplification of traditional media to build a sense of veracity and extend the reach. You saw this with the caravan in 2018, Clinton’s “sickness” in 2016. And so on. Two tracks — one through viral spread and the other through official channels, blended into something more damaging than either track alone.
I have so much writing backlogged I need to get a few quick hits out to clear the logjam.
Here’s a good example of a statistical false frame that’s visual enough for a slide.
It says “Washington Post” on the bottom there, but of course the Washington Post version lacks the “presidential term” markers.
When you see a framing has been added like that, it’s wise to think it through what has been added and whether its accurate. And of course with a little thought you’ll hopefully ask why if it covers one year of the Trump presidency and eight years of Obama the boxes of their terms are nearly equal in size? (Weird, right?) If you’re particularly adept you might ask why Obama’s term begins in 2007, which I seem to remember as the Bush presidency, though honestly I was drinking more back then, so who knows.
It’s worth asking it our “T” move (Trace claims, quotes, and media to the original context) works here, since the original graphic doesn’t settle the questions of what economy Obama inherited in 2009 or what economy he left us with. I think you could point to the context the article adds around the charts as useful (2017 figures are before Trump tax cuts and before his first budget). Still, it doesn’t give the answer to you outright, you’re going to have to think it through, and you *could* come to the same conclusions without going to the original context first.
But what the trace does in this case is it shows students where to look. By calling attention to what’s been added, removed, or altered, it focuses their thought in the right area. Show a student the initial graphic and say “Hey what’s the problem with this graph” and you’ll get a flood of answers — Is it inflation? How do we know they each caused this? It starts at $50k, it’s a bad axis! (students love this one). Going to the original context and looking at what has been altered solves the students biggest issue: where to focus their thinking first, given a bewildering array of options.
Since people asked, here’s the modified image with the real terms of office:
Note that even this is a bit unfair; most economists would say that the influence of the President on the economy (to the extent there is one) is felt through the mechanism of the budget and associated tax policy, and that does not get passed until the fall of the first year, with the tax policy going into effect for the following year. If you shift that, of course, then there is no part of this graph that is Trump budget, and the graph looks like this:
It’s also worth noting that if you go to the article there is plenty there to critique the Obama economy over — there’s pretty broad agreement that it’s surprising wages have not increased given the strength of the economy, and economists point out that the effects seen here are probably not pay raises at all but due to increased employment (e.g. if one spouse got cut to part time in the recession and can now get full-time work, household income increases, but rate of pay does not).
The Four Moves have undergone some tweaking since I first introduced them in early 2017. The language has shifted, been refined. We’ve come to see that lateral reading is more of a principle underlying at least two of the moves (maybe three). We’ve removed a reference to “go upstream” which was a bit geeky. All in all, though, the moves have remained constant, partially because so many people have found them useful.
Today, we’re introducing an acronym that can be used to remember the moves: SIFT.
(I)nvestigate the Source
(F)ind better coverage
(T)race claims, quotes, and media back to the original context
If you’ve followed the moves as they have developed over the past two years, these won’t surprise you, but there are a couple changes to the wording and the order.
The most notable is we’ve combined our habit (originally “check your emotions”) with the move (“circle back”) because these turn out to be the same thing. Basically — stop reading, stop reacting, figure out what you need to know and reapproach. In the beginning, this means to not read before you orient yourself. When researching, this means if you are getting sucked into an increasingly confusing maze of pages, STOP AND BACK UP.
The other moves are the same as the most recent iteration, with the change that “Find better coverage” replaces “other coverage” to emphasize the idea you are looking for other coverage, but ideally coverage that is slightly better on at least one dimension. What those dimensions are may be contextual, but often students have some half-decent intuitions here that can be refined over time.
We’ve also broadened out “Find the original” to its replacement which stresses that the point is not just finding the original for its own sake, but finding the original context. The original may be better — original reporting from the NYT or a fact-checked Atlantic article. But it could be worse — a claim that is sourced to a junk journal, or simply began as an unsubstantiated tweet. In the case of photos or videos, the original context is often mitigating, where media or quotes are presented with a false, inflammatory frame.
But the main introduction here is the acronym, a direct answer to CRAAP. (“Don’t CRAAP, SIFT?”).
Final note — some people might look at the acronym and think — “Isn’t this just more CRAAP? Another checklist?”
I deal with this extensively on this blog and in the textbook, but the problem with CRAAP has never been the acronym. In fact, the history of CRAAP as a web infolit device begins eight years (at least) before the acronym. The difference has always been the difference between a narrow list of things to do (SIFT) and a broad list of things to consider and rate (CRAAP). I’ve detailed at length why that makes such a difference in terms of cognitive load and other factors, so I won’t repeat it here. But my point is that a bad methodology got a lot of lift with a clever acronym that served as a convenient shorthand and a student mnemonic — it’s probably time the better methodology gets an acronym as well.
Sam prides himself on questioning conventional wisdom and subjecting claims to intellectual scrutiny. For kids today, that means Googling stuff. One might think these searches would turn up a variety of perspectives, including at least a few compelling counterarguments. One would be wrong. The Google searches flooded his developing brain with endless bias-confirming “proof” to back up whichever specious alt-right standard was being hoisted that week. Each set of results acted like fertilizer sprinkled on weeds: A forest of distortion flourished.
I have one or two quibbles with the recent article in the Washingtonian about a 13 year-old’s slide into the alt-right by way of meme-world, but the article as a whole is quite useful and, for parents at least, very moving. I recommend everyone read it, but in particular parents.
Let’s get the quibbles out of the way first. I think the article is a bit too enamored with Nagle’s Kill All Normies, and that maybe leaks into the narrative as well, with the inciting incident (wrongly accused of sexual harassment) perhaps playing too dominant a role. I would say it’s a bit too sympathetic, except of course it’s the woman’s son and the kid is thirteen. So I think we can let it slide. (Don’t read Nagle, though. Read Becca Lewis and Joan Donovan instead).
Where the article does excel, though, is in the way it gets across the process of grooming that these communities use. People tend to think of grooming in the context of sexual predators or spies — the slow process of finding disaffected people and using their disaffection to warp their mind bit by bit. But we’ve long known that this is how online radicalization works as well, from ISIS to neo-Nazis.
The quote I’ve chosen at the top of this article talks about confirmation bias, and I’ll come to the ways that is right in a second. But let me first say what “confirmation bias” gets wrong about our radicalization problem. (Trigger warning: I will be drawing a short parallel that touches on sexual predation).
No foreign power looking to recruit a spy goes up and says, hey, will you spy for us? And sexual predators do not begin grooming by asking for sex. Instead, in each case, there is a slow process of getting the target acclimated, bit by bit, to ideas thought repulsive. The grooming is achieved by hiding the destination of the grooming until the target is already deep in the alternate reality.
This is an important point, because it’s actually working against confirmation bias. Confirmation bias would take a non-Nazi, and work to keep them a non-Nazi. Confirmation bias, were all the cards on the table at the beginning of the grooming, would be protective. You’d Google, find out you were reading Nazi literature and think um, maybe I’ll read something else.
So what’s going on with these Google searches?
The Google searches flooded his developing brain with endless bias-confirming “proof” to back up whichever specious alt-right standard was being hoisted that week.
A few things are likely happening. The first is curation. The reddit group was likely feeding her son a constant stream of outrages of men being ill-treated by feminists. An ad that denigrates male aggressiveness in sports. The story of a woman falsely accusing a man of rape. Statistics showing the wage gap is a myth. A feminist saying outrageous things. Probably some fake stuff, ala #EndFathersDay thrown in for good measure. When these things are put all together in a stream, it can seem like there is a vast conspiracy to suppress the real truth. How come they never taught you this stuff, right?
Now, this is where we’d think being inquisitive would help. Get out and Google it, right? And for someone skilled at finding the right information on the web that strategy might work. But the curation and the language used produces loaded searches that just pulls one deeper into the narrative that the curation scaffolded.
What do I mean? Well, take the infamous Dylann Roof search “black on white crime” which he indicated was his first step into the radicalization that led to him slaughtering black worshipers in a church basement in an attempt to incite a “race war”. In the beginning, he put “black on white crime” into Google, and this is what happened next:
But more importantly this prompted me to type in the words “black on White crime” into Google, and I have never been the same since that day. The first website I came to was the Council of Conservative Citizens. There were pages upon pages of these brutal black on White murders. I was in disbelief. At this moment I realized that something was very wrong. How could the news be blowing up the Trayvon Martin case while hundreds of these black on White murders got ignored?
As I’ve talked about previously, “black on white crime” is a data void. It is not a term used by social scientists or reputable news organizations, which is why the white nationalist site Council of Conservative Citizens came up in those results. That site has since gone away, but what it was was a running catalog of cases where black men had murdered (usually) white women. In other words, it’s yet another curation, even more radical and toxic than the one that got you there. And then the process begins again.
So this is what the spiral looks like:
You can read Roof describe the process here:
From this point I researched deeper and found out what was happening in Europe. I saw that the same things were happening in England and France, and in all the other Western European countries. Again I found myself in disbelief. As an American we are taught to accept living in the melting pot, and black and other minorities have just as much right to be here as we do, since we are all immigrants. But Europe is the homeland of White people, and in many ways the situation is even worse there. From here I found out about the Jewish problem and other issues facing our race, and I can say today that I am completely racially aware.
From Roof’s “manifesto”.
The thing to remember about this algorithmic-human grooming hybrid is that the gradualness of it — the step-by-step nature of it — is a feature for the groomers, not a bug. I imagine if the first page Roof had encountered on this — the CCC page — had sported a Nazi flag and and a big banner saying “Kill All Jews” he’d have hit the back button, and maybe the world might be different. (Maybe). But the curation/search spiral brings you to that point step by step. In the center of the spiral you probably still have enough good sense to not read stuff by Nazis, at least knowingly. By the time you get to the edges, not so much.
Digital Literacy Interventions
There is so much that needs to be addressed here, in terms of platforms, schooling, awareness of the danger of various ideologies. In terms of underlying patriarchal and white supremacist culture, and the systems that serve to replicate and enlarge its influence. When I talk about digital literacy interventions, I do not mean to minimize this work. It’s massive.
But digital literacy is my piece of it. What do digital literacy interventions look like here?
There’s a multiple entry points here, corresponding to the parts of the spiral:
Students need a basic understanding of how curations can warp reality. I don’t think the “filter bubble” is the frame for this, since it implies that curations confirm existing beliefs and that stepping outside the curation is a net good. In reality, curations don’t protect us from opposing views, but often bring us to more radical views. Thinking about what you want from a curation in a way bigger than “both sides” is important. (Spoiler: what you want is context, and the people best suited to bring context are people in a position to know, via expertise, professional skill, or lived experience). What applies to human curation applies to algorithmic curation and recommendation as well. Students should be able to look at a YouTube recommendation list and articulate what the underlying principle of curation seems to be.
Students need to be aware of how search terms shape results. I talk about this a bit in my textbook a few years back — how searching something like “9/11 hoax” presupposes a certain type of result. If I was rewriting this book now, I’d massively expand that chapter and the examples around it. Like much of digital infolit, the key here is that the students know how to “zoom out” to a broader more neutral term, using diction likely associated with the things they would want to read.
Even the most loaded search term usually delivers a page with at least one good result. Teaching students to scan search engine result pages with an eye toward what sort of information is behind each of those links can help students, who often zero too much in on issues of result relevance when clicking and not enough on result genre and quality. Students can also be taught how to use somewhat curated searches — News-only searches, Scholar, Images.
Here, lateral reading is key. Before engaging with a new site, students should find out what the site they are reading is. What’s its agenda? Record of accuracy? Again, remember that grooming happens bit by bit, and one of its main mechanisms is hiding its true nature from the target. By getting students to realize early in the process that they are drifting into some radical and toxic territory they can choose to proceed with the right frame of reference, or maybe avoid those sources altogether.
Digital Infolit Can Help
Digital literacy, source-checking, and lateral reading are not replacements for action that needs to happen elsewhere. Sites like Reddit must consider what cultures they are supporting, and how their platform’s affordances may be exacerbating ill effects. The roots of white supremacy must be addressed. Full digital literacy should address issues of how economics, platform incentives, tribalism, and supremacist/sexist/colonial structures shape online discourse and production.
But the incremental nature of grooming on the internet does not just rely on ill-feeling or latent racism; it makes use of a series of misconceptions most people have about how to find and think about information on the web. The machinery of radicalization is massive, but small mistakes in search and site selection behavior help grease its wheels. Addressing those mistakes directly with students can help increase the difficulty of such radicalization for groomers — from neo-Nazis to ISIS — and given the relatively small cost of providing such training, is an intervention we should be pursuing.
If you want to see how data voids are utilized by extremists, here’s a good example. Last night a prominent conservative organization tweeted this image:
You see the beach ball, right? It asks you to Google the “Kalergi Plan”. What’s that? It’s an anti-Semitic conspiracy theory that has its roots in “the ‘white genocide’ and ‘great replacement’ conspiracy rhetoric in far-right circles, which allege that a secret ruling class of Jewish elites is using immigration policy to remove European white people from the population.”
It’s garbage, and it’s dangerous garbage. Specifically, it’s the sort of garbage that motivated both the shooter in the Tree of Life massacre and the shooter in the Christchurch shootings.
But what happens when you Google this term?
What you see immediately are the videos. These videos, like a lot of conspiracy videos, take a little known footnote in history and place it center stage. Kalergi, of course, is historically real. But he is also a historical figure of little note and no current influence. As such, there isn’t much writing on him, except (and here’s the main thing) by those who have put him at the center of a fake conspiracy.
So what you get is what researchers call a “data void“: people who know anything about the history of Europe, immigration, etc. don’t talk about Kalergi, because he is insignificant, a figure most notable for the conspiracy theories built around him. But people using the conspiracy theory talk about Kalergi quite a lot. So when you search Kalergi Plan, almost all the information you get will be by white supremacist conspiracy theorists.
These bad actors then use the language of critical thinking to tell you to look at the evidence and “make up your own mind.”
But of course if you’re searching “Kalergi plan”, most of the “evidence” you are getting comes from white supremacist conspiracy theorists. Making up your own mind under such a scenario is meaningless at best.
Things used to be much worse up until a few months ago, because if you watched one of these videos, YouTube would keep playing you conspiracy videos on the “Kalergi Plan” via a combination of autoplay, recommended videos, and personalization. It would start connecting you to other videos on other neo-Nazi theories, “race science”, and the like. People would Google a term once and suddenly find themselves permanently occupying a racist, conspiracy driven corner of the internet. Fun stuff.
Due to some recent actions by YouTube this follow-on effect has been substantially mitigated (though their delay in taking action has led to the development of a racist-conspiracist bro culture on YouTube that continues to radicalize youth). The tamping down of the recommended video conspiracy vector isn’t perfect, but it is already having good effects. However, it’s worth noting that reducing the influence of this vector has probably increased the importance of Google This ploys on the net, since people are less likely to hit these videos without direct encouragement.
What can we do as educators? What should we encourage our students to do?
1. Choose your search terms well
First, let students know that all search terms should be carefully chosen, and ideally formed using terms associated with the quality and objectivity of the information you want back. Googling “9/11 hoax” is unlikely to provide you reliable information on the 9/11 attacks, as people who study 9/11 don’t refer to it as a hoax. In a similar vein, “black on white crime”, the term that began the radicalization of the Charleston church shooter, is a term used by many neo-Nazis but does not feature prominently in academic analysis of crime patterns. Medical misinformation is similar — if you search for information on “aborted fetuses” in vaccines when there are not aborted fetuses in vaccines the people you’re going to end up reading are irresponsible, uneducated kooks.
This isn’t to say that a better search term gets you great results, especially around issues that are conspiracy-adjacent. But a better search term may at least return a set of results with some good pages listed. Here are the top results for a bad search ([[“aborted fetuses in vaccines”]] on the left), and ([[stem cells vaccines]]) on the right.
Note the differences (reliable sources are highlighted). With the loaded terms on the left, the top two results are from unreliable sources. However, the less loaded search returns better results. In addition to seeing some scholarly articles with the better terms (a possible-though-not-foolproof indicator you are using better language) the second item here is not only a reliable resource on this issue, but one of the best comprehensive explanations of the issue written for the general public, from an organization that specializes in the history of medicine. Search on the loaded terms, however, and you will not see this, even in the first fifty results.
2. Search for yourself
Conspiracy theorists are fond of asking people to “think for themselves” — after those people use the suggested conspiracy-inflected search terms to immerse themselves in a hall of mirrored bullshit. A better idea might be to do less thinking for yourself and more searching for yourself. When you see signs or memes asking you to search specific terms, realize that the person asking you to do that may be part of a community that has worked to flood the search results for that term with misinformation.
When we say “search for yourself” we do not mean you should use terms that return information that matches your beliefs. We mean that you should think carefully about the sort of material you want to find. If you wish to find scholarly articles or popular presentations of scholarly work, choose terms scholars use. If you are interested in the history of the Europe’s current immigration policies, search for “history of europe’s immigration policies”, not “Kalergi Plan”. Don’t be an easy mark. There’s nothing more ridiculous than a person talking about thinking for themselves while searching on terms they were told to search on by random white supremacists or flat-earthers on the internet.
A final note — for the moment, avoid auto-complete in searches unless it truly is what you were just about to type. Auto-complete often amplifies social bias and for niche terms it can be gamed in ways that send folks down the wrong path. It’s not so bad when searching for very basic how to information or the location of the nearest coffee shop, but for research on issues of social controversy or conflict it should be avoided.
3. Anticipate what sorts of sources might be in a good search — and notice if they don’t show up
Before going down the search result rabbit hole, ask yourself what sort of sources would be the most authoritative to you on those issues and look to see if those sorts of pages show up in the results. There’s a set of resources I’ve grown used to seeing when I type in a well-formed medical query — the Mayo Clinic, WebMD, the American Cancer Society, the National Institutes of Health. (as an example, look at “melatonin for sleep” as a search). When looking for coverage of national events I’ve grown use to seeing recognizable newspapers — the Washington Post, the Los Angeles Times, the Wall Street Journal.
Students don’t necessarily have the ability to recognize these sorts of sources off the bat, but they should cultivate the habit of noticing the results that turn up in a well tuned query, and the sources that turn up in a data void, such as the “death foods” term you may occasionally see in website sponsored ad chumbuckets. Initially this understanding may be more about genre than specific touchstones — expecting newspapers to show up for events, hospitals and .gov sites for medical searches, magazine or journal treatments of policy issues.
The important thing, however, is anticipation. Does the student have at least a vague expectation of the sorts of sources they expect to see before they hit the search button. If they develop the habit of forming these informal expectations they are more likely to reanalyze search terms when those expectations are violated.
There’s a story going around right now about a “reporter” who was following people shorting Tesla stock and allegedly approaching them for information. I won’t go into the whole Elon vs. the Short Sellers history, you don’t need it. Let’s just say that posing as a reporter can be used for ill in a variety of ways and maybe this was a case of that.
The way a lot of people judge reputation is signals, the information a person chooses to project about themselves. Signals can be honest or dishonest, but if a person is new to you you may not be able to assess the honesty of the signal. What you can do, however, is assess the costliness of the signal. In the case of a faker, certain things take relatively little time and effort, but others take quite a lot.
Let’s list the signals, and then we’ll talk about their worth.
First there’s the Twitter bio and the headshot. The headshot is an original photo — a reverse image search here doesn’t turn up Maisy, but it doesn’t turn up anyone else — it’s less likely to be a stolen photo. The Twitter bio says she’s written for Bloomberg.
This isn’t that impressive as verification, but wait! Maisy also has a website, and it looks professionally done!
From the website we learn that she’s a freelancer. Again, user supplied, but she links to her LinkedIn page. She’s got 194 connections, and is only 3 degrees of separation from me! (I’m getting a bit sick of this photo, but still).
Oh, and she went to Stanford! Talk about costly, right? You don’t do that on a whim!
The Usual Signals Are Increasingly Garbage
Here’s the thing about all the signals here: they are increasingly garbage, because the web drives down the cost of these sorts of things. And as signals become less costly they are less useful.
Your blurb on Twitter is produced directly by you — it’s not a description on a company website or in a professionals directory. So, essentially worthless.
That photo that’s unique? In this case, it was generated by machine learning, which can now generate unique pictures of people that never existed. It’s a process you can replicate in a few seconds at this site here, as I did to generate a fake representative and fake tweet below.
The website? Domains are cheap, about $12 a year. Website space is even cheaper. The layout? Good layout used to be a half-decent signal that you’d spent money on a designer — fifteen years ago. Nowadays, templates like this are available for free.
LinkedIn, though, right? All those connections? The education? I mean, Stanford, right?
First, the education field in LinkedIn is no more authoritative than the bio field in Twitter. It’s user supplied, even though it looks official. Hey, look, I just got into Stanford too! My new degree in astrophysics is going to rock.
The connections are a bit more interesting. One person called one of Maisy’s endorsements to see if they actually knew this person. Nope, they didn’t. Just doing that thing where you don’t refuse connections or mutual endorsements. “Maisy” just had to request a lot of connections and make enough endorsements and figure that enough of a percentage would follow or endorse her back. Done and done.
I’ll tell you a funny story, completely true. I once friended someone on LinkedIn that I knew, Sara Wickham or somesuch. And we went back and forth talking about our friends in college in 1993 — “Remember Russ?” “Guy with the guitar, always playing the Dead and Camper Van Beethoven?” “Oh you mean Chris?” “Right, Chris.” “Absolutely. Whatever happened to Chris, anyway?”
A week or so into our back and forth I noticed we had never attended the college at the same time, and as I dug into it I remembered the person I was thinking of didn’t have that last name at all. I had never met this person I was talking with, and in fact we had no common friends.
That’s LinkedIn in a nutshell. Connections are a worthless indicator.
Stop with the “Aha, I Spotted Something” You’re Firing Up In Your Head
So now maybe you’re channeling your inner Columbo and just dying to tell us all about all the things you’ve noticed that “gave this away”. You would not have been fooled, right?
I mean, there’s a five year work gap between Stanford and reporting. She graduated in 2013 and then started just *now*. Weird, right? There’s a sort of bulge in the photo that’s the tell-tale sign of AI processing! And it’s the same photo everywhere! The bio on the website sucks. The name of her freelancing outfit is Unbiased Storytellers, which feels as made up a name as Honest Auto Service.
Here’s the thing — you’re less smart if you’re doing this stuff than if you’re not. You know the person is fake, and so what you’re doing is noticing all the little inconsistencies.
But the problem is that life is frustratingly inconsistent once put under a microscope. The work gap? People have kids, man. It’s not unusual at all — especially for women — to have a work gap between college and their first job. If that’s your sign of fakery, you’re going to be labeling a lot of good female reporters fake.
That photo? Sometimes photos just have weirdness about them. Here’s the photo of a Joshua Benton on Twitter, who tweets a lot of stuff.
Joshua’s Twitter bio claims that he’s a person running a journalism project at Harvard, so it’s a bit weird he’s obscuring what he looks like. Definitely fishy!
Except, of course, Benton does work at Harvard, and in fact runs a world famous lab there.
What about Maisy’s sucky bio? Well, have you ever written a sucky bio and thought, I’ll go back and fix that? I have. (A lot of them made it all the way into conference programs).
And finally the name of her freelance shop: Unbiased Freelancing and Storytellers. Surely a fake, right?
Funny story about that. A bunch of Twitter users were investigating this story and looking at her LinkedIn connections/endorsements. And one of them found the clearly fake Mr. Shepard, a “Dog Photographer and Maker of Paper Hats”:
Do I have to spell this out for you? His name is Shepard and he photographs dogs. Look at the hats, which are CLEARLY photoshopped on (can you see the stitching?) His bio begins “Walking the line between storyteller and animal handler…” Come ON, right?
Except then the same person called the “Puptrait” studio. And JB is real. And his last name is really Shepard.
And the hats aren’t photoshopped, he really does make these custom paper hats that fit on dog’s heads.
If you’d think this was fake, don’t blame yourself — when reading weak and cheap signals you are at the mercy of your own biases as to what is real and what is not, what is plausible and what is not. You’ll make assumptions about what a normal work gap looks like based on being a man, or what a normal job looks like based on being something that isn’t a dog photographer and maker of paper hats.
I actually used to do this thing where I would tell faculty or students that a site was fake, and ask them how do we know? And they would immediately generate *dozens* of *obvious* tells — too many ads, weird layout, odd domain name, no photos of the reporters, clickbaity headlines, no clear about page. And then I would reveal that it was actually a real site. And not only a real site, but a world renowned medical journal or Australia’s paper of national record.
I had to stop doing this for a couple reasons. First, people got really mad at me. Which, fair point. It was a bit of a dick move.
But the main reason I had to stop is after having talked themselves into it by all these things they noticed, a certain number of the students and faculty could not be talked out of it, even after the reveal. Each thing they had “noticed” had pulled them deeper into the belief that the site was faked and being told that it was actually a well-respected source created a cognitive dissonance that couldn’t be overcome. People would argue with me. That can’t really be a prestigious medical journal — I don’t believe you! You’ve probably got it wrong somehow! Double-check!
It ended up taking up too much class time and I moved on to other ways to teach it. But the experience actually frightened me a bit.
Avoid Cheap Signals, Look For Well-Chosen Signs.
By looking at a lot of poor quality cheap signals you don’t get a good sense of what a person’s reputation is. Mostly, you just get confused. And the more attributes of a thing you have to weigh against one another in your head the more confused you’re going to get.
This situation is only going to get worse, of course. Right now AI-generated pictures do have some standard tells, but in a couple years they won’t. Right now you still have to write your own marketing blurb on a website, but in a couple years machine learning will pump out marketing prose pretty reliably, and maybe at a level that looks more “real” than stuff hand-crafted. The signals are not only cheap, they are in a massive deflationary spiral.
What we are trying to do in our digital literacy work is to get teachers to stop teaching this “gather as much information as you can and weigh it in a complex calculus heavily influenced by your own presuppositions” approach and instead use the network properties of the web to apply quick heuristics.
Let’s go back to this “reporter”. She claims to write for Bloomberg.
Does she? Has she written anywhere? Here’s my check:
I plug “Maisy Kinsley” into Google News. There’s no Maisy Kinsley mentioned at all. Not in a byline, not in a story. You can search Bloomberg.com too and there’s nothing there at all.
Let’s do the same with a reporter from the BBC who just contacted me. Here’s a Google News Search. First a bunch of Forbes stuff:
Downpage some other stuff including a BBC reference:
If we click through to BBC News and do a search, we find a bunch more stories:
We’re not looking at hair, or photos, or personal websites or LinkedIn pages or figuring out if a company name is plausible or a work gap explainable. All those are cheap signals that Frey can easily fake (if a bad actor) and we can misread (if he is not). Instead we figure out what traces we should find on the web if is Frey really a journalist. Not what does Frey say about himself, but what does the web say about Frey. The truth is indeed “out there”: it’s on the network, and what students need is to understand how to apply network heuristics to get to it. That involves knowing what is a cheap signal (LinkedIn followers, about pages, photographs), and what is a solid sign that is hard to counterfeit (stories on an authoritative and recognizable controlled domain).
Advancing this digital literacy work is hard because many of the heuristics people rely on in the physical world are at best absent from the digital world and at worst easily counterfeited. And knowing what is trustworthy as a sign on the web and what is not is, unfortunately, uniquely digital knowledge. You need to know how Google News is curated and what inclusion in those results means and doesn’t mean. You need to know followers can be bought, and that blue checkmarks mean you are who you say you are but not that you tell the truth. You need to know that it is usually harder to forge a long history than it is to forge a large social footprint, and that bad actors can fool you into using search terms that bring their stuff to the top of search results.
We’ve often convinced ourselves in higher education that there is something called “critical thinking” which is some magical mental ingredient that travels, frictionless, into any domain. There are mental patterns that are generally applicable, true. But so much of what we actually do is read signs, and those signs are domain specific. They need to be taught. Years into this digital literacy adventure, that’s still my radical proposal: that we should teach students how to read the web explicitly, using the affordances of the network.
We’re about halfway into the coding hours on this which is a bit scary. We still have some expert hours from TS Waterman at the end to solve the hard problems but right now we’re solving the easy ones.
A couple weeks ago we put out a prototype. The prototype was for one of the three moves we wanted to showcase, and it was functional, and used the original concept of a headless Chrome instance in the background to make these things. The protoype did what good prototypes do and showed that project was possible, but there were three weak spots:
First, the Chrome screenshots could usually be manipulated to capture the right part of the screen (e.g. scroll down to a headline or get the correct Google result scrolled into view). But this was a bit more fragile than hoped as we tested it on a wide array of prompts.
Second, headless chrome was really slow on some sites. Even on speedy sites, like Google, the fire-up and retrieval would normally be a couple seconds but could stretch to much much more. We were headless chroming three sites and on the occasional call where all three went slow we’d sometimes get over 30 seconds. This didn’t happen a lot (timings were usually about 10 – 15 seconds for the entire process) but it happened enough.
Finally, because headless chrome is headless a lot of things needed to make the animation instructive (mouse pointers, cursors, omnibars) have to be added anyway via image manipulation.
I played with the settings, with asynchrony, with using a single persistent instance of Chrome Driver, and things got better, but it became clear that we should offload at least some problems to a caching mechanism, and where possible use PIL to draw mockups off of an HTML request rather than doing everything through screenshots. So I’m in the middle of that rebuild now, with caching and some imaging library rework. Hoping to get it reliably under 10 seconds max.