Activity: Evaluate a Website


This is Tim Gunn, the star of the Project Runway series, and a gay rights advocate. But is that message he is holding up for real? It looks like it might be another example of sign-faking, where the content of pieces of paper held up by someone are digitally altered.

Now the first step of this project is to find a site where Tim Gunn himself reveals whether the image is real, and whether he actually said that. But that will lead to the next problem — is your source a trustworthy site? So track down Tim Gunn talking about this, then read laterally to determine whether the site you are reading is a legitimate journalistic enterprise or a hoax site. In the comments mention the site you used and the site you used to confirm that site’s legitimacy.

Ready… set… Go!



Activity: Confirming the Nixon Witch-Hunt Headline

A fun one for today. President Trump made a comment about a witch-hunt, and then this showed up in my Twitter feed. So, is this a real headline or a fake?


“Nixon Sees ‘Witch-Hunt’ Insiders Say” by Woodward and Bernstein.

Did this article really run, with this headline? If so, how did you verify it? If not, how did you debunk it?

(This one makes me think we should have a chapter in the book’s field guide about finding old newspaper articles. )

As always, leave your answer in the comments, along with how you got there. For this particular case, if the headline is not fake, write the first line of the newspaper story as proof you found it.

Innovation vs. Invention

As everyone is aware, I delete my tweets on a rolling basis. But over my morning coffee I had an great discussion with Rolin Moe, David Kernohan, and Maha Bali about innovation which is probably worth snapshotting here.


snaggit 2.png


google books.PNG

And, while I’d love to say I just have the greatest instincts, it turns out that Audrey Watters found that graph before in a piece I read three years ago.


It’s sad, really. I suppose I could turn it into a lesson about how Twitter’s quick and unresearched writing is practically a factory for misattribution. But instead I’ll just say All Hail Audrey Watters! and leave it at that. ūüėČ


The Annotation Layer As a Marketplace for Context: A Proposal

A lot of our thinking about giving articles a “fact-checking” context has been about automated, centralized, closed approaches — Facebook algorithms that flag things, plugins that provide context, etc. Some of these things are deep in proprietary plumbing of platforms. Others are service-based real-time overlays of information. All of them require you opt-in to some particular company’s product, approach, extension, or algorithm. This leads to a real problem for a couple reasons:

  1. We’re going to end up putting all the eggs in one solution basket. Do you use AI or SNA or flagging? My 108-variable algorithm or your 52-variable one? And once there’s lock-in to that approach — once browsers or platforms or whatever have selected a solution, competition disappears — it’s winner take all. We’re already seeing everybody jockeying for position here, trying to be *the* solution. Things like this don’t end well.
  2. Centralizing this sort of stuff is attractive, yet problematic. I’m a believer in smart defaults. But without ability to select from multiple context providers the centralization will eliminate broad swaths of valuable perspective. Worse — we won’t notice those perspectives have become invisible.

In the relatively short time since the election, I’ve met so many intelligent people working on these issues. T. S. Waterman wants to look at issues around time and place and credibility, in a way more people should be thinking about. The Hoaxy people are looking at propagation networks. Others are coming up with crowdsourcing approaches, or domain credibility signals.

So much good work. This community feels alive, and vibrant, and energized. These are some of the smartest people I’ve met in my life, and if we keep working together, we can truly make the web better. And yet I can feel the pressure here — who is going to get funded? Whose approach is going to win? The vibrancy and creativity of the community is unfortunately being undermined by the race of funders and platforms to find¬†the solution.

Annotation As a Marketplace for Context

Here’s my thought: before we lock in to this approach or that one, maybe we should think about an approach that allows the sort of creative experimentation we need to foster in this area. And if we do that, annotation is the obvious marketplace we could use.

How does this work? Well, say you have a document — a web page.


Now, different researchers and tool providers may have different insights about this page:

  • Researcher A has compiled a list of all major newspapers that meet a certain standard of legitimacy, and can test “Tulsa World” against that.
  • Researcher B has a kickass social network analysis which shows the networks this article is flowing through, and can identify with reasonable confidence whether this article is favored by conspiracy theorists.
  • Researcher C ¬†has a kickass social network that also looks for conspiracy and hoaxes but with different strengths than Researcher B.
  • Researcher D has an NLP tool which can identify if this is Op-Ed content or a news story, or somewhere in-between.
  • Tool maker A has a program which looks for a “credibility signal” from the domain, and publishes it on a scale of 1-10
  • Tool maker B¬†has a tool that looks up all the organizations in the article to see if they are Front Groups, and all the experts cited to see if they are recognized by Google Scholar
  • Organization A (say, Politifact) has information about certain claims in the article.
  • Organization B (say,¬†has information about certain claims in the article.

And so on. Now we could go through eight different tools to benefit from those different insights. Or we could take these multiple insights, hand them to someone, and say “use these insights to make the best possible tool to help people evaluate this article.” Or the Facebook of the industry could hire Researcher A, use the work of Researchers B & C, ignore Researcher D, and decide whether to partner with organization A or B.

None of these seem right to me. They all lead to the single platform invisible algorithm nonsense that got us into this mess. They all shut down the vibrant conversation and collaborative experimentation that has developed in this space. If a new approach to article contextualization isn’t up-and-running when the big checks are cut, its insights get lost to history.

So what to do? I’d suggest we do what we often do when we want to move more quickly: separate the data layer from the interface layer. And I’d suggest that the best way for everyone to work together is to use the newly W3C-approved annotation layer as that data layer.


How would this work in practice? Let’s say I have some idea for contextualizing articles on the web. I’ll choose here a little thing I’ve been working on — looking at news stories and scanning them for astroturf industry groups, ¬†e.g. coalitions of coal industry groups posing as Scientists for Progressive Energy Policy or the like. I’ve developed a database of these groups and links to wiki pages summarizing the the funding and governance of these groups and their history. I want to get that to the user.

One method — the method many people seem to want to use — is to make an extension that looks at pages and highlights these organizations and links to the research. But of course I’ve already got half a dozen extensions, and the cost of marketing to get people to adopt yet another extension far exceeds the cost of funding the design of the tool (and takes away funds from improving identification of claims, lies, and context). It balkanizes effort, leading to dozens of tools all which do a fraction of the job required.

Another method is to go to a big provider like Facebook and say you should provide this front group research as context to your users. Maybe they say yes, and maybe they say no. But in choosing what they want to choose they centrally determine the approaches that will be use. It’s OK for them to do that, certainly. But if you have a tool or approach that works, do you really want to move it into the sealed vault of Facebook code?

The annotation alternative is more attractive. Instead of creating an extension to show my generated context or playing the zero-sum Facebook game, I run my tool as a bot on web pages, generating machine-and-human readable context in the annotation layer. For example, in this scenario, our bot spots the phrase “Americans for Medical Progress” …


…and tags that as the name of a known industry front group:


You’ll note we have a human readable note here. But we also have some machine readable tags that identify this issue, and a link to information on the front group.

Doing it this way, some other process can come in and tag the page with additional information. A hoax bot can look at a social network analysis of how this page is moving through the net and determine that it’s got a viral hoax rating of zero:


Because annotation is just a data layer, you can add as many of these as you want, stored with reference to the URL and anchored to specific text on the page.

This allows anyone that has a piece of the solution to spin up an annotation bot with a few lines of Python code and push their information out to the reader endpoints. For example, we can imagine that Politifact could not only debunk claims, but send bots out that look for those claims and link to those claim analyses on the Politifact site. Here, we imagine that Politifact has done a treatment of the claim referenced here (they haven’t — this is just an example):


As you can see, various amounts of signal and context can be layered on here. And since this is all retrievable by API, front end services can decide what sorts of signal and context they want to look at (and from whom) when they make different display decisions. A front-end extension could pull information from 20 or 30 separate annotation providers in giving context to a page. A service such as Facebook could pull from hundreds if they wanted to. But the data¬†— the listing of what we know about the pages from all different approaches and veins of research — is open to anyone to build and innovate on top of. ¬†And crucially, it makes entry into the marketplace of analysis extraordinarily cheap.


The first concern that people will come up with is whether botting is inefficient compared to centralized management of this process, or the just-in-time approach you can do with an extension. The tagging of hundreds of thousands of pages and the associated management of things stored in the annotation layer just seems exhausting. Won’t you have to bot the whole internet? Doesn’t that make everyone have to run their own Googlebot?

In practice, no. The thing is if you have an analysis that you want to share, you most likely have identified a set of pages already, and that set is likely relatively small. So, for instance, if I’m the hoaxish bot above, I’m not looking at every page on the internet — I’m just looking at higher sharecount pages moving across Facebook. Even if I’m looking at the top 50,000 Facebook pages of the day, tagging those pages is still the sort of thing that could batched overnight. ¬†But even there, if 99% of those pages are not new you’re tagging 500 pages a day, which is similar to what I did last night from Starbucks on a laptop while they were making my cappuccino. There’s a power law to this stuff, and it favors botting.

The second but more serious question is about the transparency. One of the advantages that Facebook has is it hides it’s signal. You don’t know you’re being discriminated against as a site or a sharer, and you don’t know how it was computed. Facebook’s take on your content is like a credit score that you can never look up, and that’s good for Facebook, because it makes it harder to game or contest.

On the whole, though, I think this is more feature than bug. Yes, being able to look at your page ratings over 100 signals from different providers will make it easy to game the system and also let you know who is ranking you down or highlighting your use of bad sources. But ultimately, while we need to protect the researchers who produce these tools we also need to give people more transparency into why their pages are suddenly not drawing traffic. Not only is that fair, but it might also put some pressure on legitimate entities to clean their act up: if my front group bot keeps finding quotes from front groups in my local paper, maybe they’ll be a bit more diligent about source checking.


So how do we make this distributed, open approach to this happen? How do we make the annotation layer a marketplace for context? And how do we make sure that market is functional, navigable, and useful both to tool makers and end-users? I leave that all to you. But let’s think about this — let’s not put all our eggs in another sealed basket. Let’s keep the energy around this issue and invite all comers to add their piece of insight. ¬†The other path — closed, single platform, invisible solutions? We’ve been through that already. It’s how we got here. Let’s not repeat that mistake again.


I apologize that crediting this set of realizations is such a mess, actually: it goes back to conversations at Hewlett Open Educational Tools conference about annotation as the universal bus in January (which was based on prototypes of citing capability I had built with Jon Udell in November). And then at iAnnotate I ended up having the same conversation with people throughout the three days of the conference, and pulling insights from previous conversations into the next conversation until who knows what came from where in the end. It came to a head in a conversation over sliders with Jon, T.S. Waterman, Peg Fowler, and Tom Gillespie on Friday night, but others I talked to will see their conversations in here too, particularly the unconference group on Day One where we came up with a “stub articles” proposal (great unconference group, or greatest unconference group of all time?). Apologies all around about the credit, really.

Auto-Annotating News Articles To Scaffold Media Literacy Skills In Students

I’ve been playing around a bit with auto-annotating news articles to foster better literacy reflexes in students. Here’s the latest work in progress: I’ve made an annotation bot that goes out and finds articles mentioning industry front groups and asks students to do research to confirm or deny the connection.

How does this work? I compiled a list of over 150 known front groups — groups that present as social activism groups, but are thought to be industry-funded astroturf.

Since most people don’t understand bots, I want to be really clear about what I’m about to show you. My code¬†doesn’t annotate in real time. It runs as a “batch” process overnight, goes and tries to find new articles mentioning front groups, and has new pages annotated for the general public when you wake up. (It’s a surprisingly quick process, actually, so if you wanted to run it during the day or during a coffee break you could).

Got it? This goes out and annotates pages based on the fact they mention some potentially dubious organizations. But it does so in a way that anyone can look at the annotations.

So let’s show an example. Some test crawls from last night showed this story here (among about a couple hundred other stories):


Ignore the “Untitled Document” bit right now — that’s a bug being worked out. Future annotations will display titles.

In any case, we go check this article out. Maybe we’re just interested. Or maybe as a student we were told to check out one of the results of contextbot and further annotate it. Here’s what we find:


It’s an op-ed from a researcher who talks about their lab’s cute little test piglet “Slinky” and how much they adored him. It’s pretty folksy stuff:

Let me tell you about Slinky.

Slinky was sweet and full of personality, an adorable and playful piglet who grew to be a gentle and smiley giant. Everyone who met him was smitten instantly. He was purchased when he was 2 months old to help us develop surgical solutions to congenital heart defects in children.

It goes on. And the point of the article (which I actually don’t disagree with) is that animal testing is necessary to save human lives. The subpoint, which is more debatable, is that scientists in these industries already do their best to minimize the suffering of animals where possible, and more regulation isn’t necessary. That point I don’t actually have an opinion on — so it’s great to see a view from inside the process from a dedicated scientist.

At the end of the article is this byline:


The byline identifies the author as chair of Americans for Medical Progress. It’s the sort of thing an perceptive student would select and search on, to find more about the source of the information. And it’s the kind of thing a majority of students wouldn’t notice or think about at all. What you notice above, however, is that we’ve already pre-annotated this using our bot.

The highlight calls the students eye to the name of the organization. Clicking the annotation brings up this:


The annotation displayed informs the student that this may be a front organization — an organization that attempts to appear independent but is really there to do the bidding of others. But it doesn’t solve it for the student — it invites the student to add to the investigation.

Clicking on the investigate link brings up a page that reveals, among other things, that the organization is primarily run and funded by pharmaceutical companies and a large supplier of lab animals:

AMP’s board of directors consists of senior executives and other representatives employed by the pharmaceutical and vivisection industries. They include Charles River, Abbott Laboratories, GlaxoSmithKline, Pfizer, AstraZeneca, Sanofi-Aventis and Merck. [2] Charles River Laboratories is the world’s largest supplier of laboratory animals. It has been described as the “General Motors of the laboratory animal industry”. [3] Board members also represent universities and institutions receiving government grants for vivisection. Many corporations and institutions on AMP’s board have amassed a history of gross animal welfare violations in the United States and Europe and been the focus of animal, health, consumer and human rights advocates. See also sections 2 & 4 & SW articles on individual companies.

It follows with a list of animal welfare violations of the companies of the board members.

Is this believable? Reliable? We don’t know. SourceWatch, the source of the page on this organization, leans left and is a fairly anti-corporate site. But this gives the student enough search terms, context, and momentum to start their own investigation. Armed with these facts, we search Google for connections and find this Google book (from Springer, though students will not recognize publisher quality. The book explains the group was largely set up by the US Surgical Corporation which came under fire for the use of live dogs to demonstrate surgical staples:


Again, I don’t necessarily trust this, or think it’s cause to throw out the argument made in the column. But it’s a heck of a qualifier. So we add to the stub that the annotation bot has added a link to this reference, as a reply to the botted annotation:


In the annotation we link to the passage in the book directly, and add a link to citations of Garner’s work in Google Scholar, to show that he’s respected in this field.

More students can come, do more searches, and add more information. The process is similar to the process Wikipedia uses to generate “stubs” — we use publicly available resources plus automation to find work that needs to be done and provide a scaffold for starting it. Students can be graded on the strength of their contribution — either directly, or through conversation with other students to keep them impartial.

I’m pretty excited about this. It provides the sort of scaffolding and direction that students may need in this area while still allowing students to do authentic, public work.

What do you all think?



Introducing the #CheckPlease Tag

One of the things we have learned as we’ve run the student fact-checking project is the hardest thing is to get all the students unique stuff to check.

It’s not that there aren’t enough facts out there needing checking — we see them daily. But consider a teacher of history who wants to do a claim checking project in their class. Let’s say they have 60 students, working in pairs. How long does it take them to compile a list of cool claims for those students to check, complete with the use of those claims in context?

What we’ve found is it takes a lot of time. It’s one of those memory things — I’m sure you’ve heard two dozen songs with the number 19 in the lyrics, but can you list them? (Note, if you try this, you’ll start listing off songs with 19 in the title). We don’t have a box in our memory of “false history claims I’ve seen on Twitter” any more than we have a box labeled “Songs containing prime numbers”.

So I’d like to propose we use the power of annotation and tagging to solve this. When you see a claim that might make a good fact-checking mini-project for students, use your favorite annotation or tagging engine to capture it and throw it into a library that faculty can draw on for projects.

Make sure that you choose claims that are both true and false. The idea is not to use #CheckPlease to claim something is wrong, or evenly wrongly supported. It really just notes the claim is interesting, and might make a good subject for student research. Use it during your daily reads when you stumble on something that fits. For example, I like Jeet Heer, and I think I remember this claim being true — but wouldn’t it be a great subject for a student wiki article?


Jeet Heer makes the claim that trains in Mussolini’s Italy did — contrary to received wisdom — did *not* run on time.

Let’s capture this with the #CheckPlease tag. In this case I’ll do it with First I make sure that I am on this particular tweet (clicking the time at the bottom gets me there). Then I select the claim and hit the annotate button.

trains 2

I add the “CheckPlease” tag in (no hashtag needed here as Hypothesis uses a plain word format. For findability, I add a few more tags.


Now this shows up in a Hypothesis feed:


And students can click the link to see the claim in context and start their investigation. Additionally, they can use replies to indicate they are working on the claim or have published something on the claim:


And the students can work on that document in the format of their choice. In this case we show a link to Digipo, but it could just as well be a WordPress Blog:


Simple, right? You could also choose to collect this stuff in Diigo, or Delicious, or Pinboard, or any other platform that supports tagging URLs. Try it. Help us out! The wider variety of these things we collect, the more authentic projects we have for students and the better we can provision classes with meaningful tasks.



The best thinking about media and media literacy this week comes from Linda Holmes, a journalist who generally writes about television. Maybe that gives her a special insight, I don’t know. Or maybe it’s just she’s wicked smart.


She goes on:




Corner-clipping is exhausting to both the people who read it and the people who make it. ¬†Sometimes pushback is necessary — when things are headed the wrong way. I have sympathy for folks who looked at the recent Heineken commercial and were worried that the underlying message was that marginalized groups must expend effort to defend their basic humanity. That seems a big point. Those are¬†not corners, that’s the center of the sheet.

But for every bona-fide take that contains big points, there’s a hundred small ball takes reacting to the Hot Take Magnet of the day, breaking all the Updike Rules with abandon. And usually they come down to “The Three Things That Medium Piece Gets Wrong About X”. Which would be great, if those were the three main points of the original piece, but they are usually not. It’s usually more a case of “This piece talks about X and is wrong because it does not talk about Y.”

I do it too. I see your take on the cultural problems around media literacy and say “The thing this gets wrong is it doesn’t talk about web literacy skills!”. Someone else sees my “skills” take and says “The thing this gets wrong is it doesn’t talk about the systemic corruption of neoliberal knowledge production!” But maybe we aren’t getting things wrong. Maybe we’re just writing about the piece of the story we know and not every intellectual argument is zero-sum. Maybe not everything everyone writes or says has to be an encyclopedia on the subject, and if you find some pieces missing in what someone else says you could just supply them in a separate piece without waging a war of attrition.

If we don’t do that — if we don’t see our parts in these debates as additive rather than subtractive — we end up with a society where a man cries about the near death of his child on TV and begs his non-political audience for the first time in his life to take a very political position — ¬†and is met with a lecture about partisan control of the legislature. It’s unclear what purpose this serves, except to exhaust the emotional capacity of everyone, and make all involved feel the best route is simply stay silent, uninvolved, cynical, and standoff-ish.

There are times when ideas as expressed are horribly wrong, and you should certainly fight bad ideas. Stand up to stuff that is 180 degrees wrong, or even 140 degrees wrong.  Maybe even 90 degrees. (I am not the sole adjudicator of degrees of wrongness, you make the call).

But if people make a contribution that is correct but imperfect, take what you get from people, thank them, and build off it. Stop waging wars that don’t exist. Write your own post, plot your own path, add your own story.

That is all. Thank you.