Stop Reacting and Start Doing the Process

Today’s error comes to you from a Tulsa NBC affiliate:

tulsa

 

Of course, this was all the rage on Twitter as well, with many smart people tweeting the USA Today story directly:

markos

It’s a good demonstration of why representativeness heuristics fail. Here’s the story everyone fell for:

usa today

So let’s go through this — good presentation, solid source. Headline actually not curiosity gap or directly emotional. Other news stories look legit. Named author. Recognizable source with a news mission.

Now the supporters of recognition approaches will point out that in the body of the article there is some weird capitalization and a punctuation mistake. That’s the clue, right!

kerosene

When we look back, we can be really smart of course, saying things like “The capitalization of Kerosene and the lack of punctuation are typical mistakes of non-native speakers.” But in the moment as your mind balances these oddities against what is right on the page, what are your chances of giving that proper weight? And what would “proper weight” even mean? How much does solid page design balance out anachronistic spelling choices? Does the lack of clickbaity ads and chumbuckets forgive a missing comma? Does solid punctuation balance out clickbait stories in the sidebar?

Your chances of weighting these things correctly are pretty lousy. Your students’ chances are absolutely dismal. When actual journalists can’t keep these things straight, what chance do they have?

Take the Tulsa news site. Assuming that USA Today was probably a better authority on whether we still capitalize “kerosene” (which was once a brand name like Kleenex), the Tulsa writer rewrites the story and transcribes the misspelling faithfully while risking their entire career:

kerosene2

We know looking at surface features doesn’t work. Recognition for this stuff is just too prone to bias and first impressions in everyone but an extremely small number experts. And even most *experts* don’t trust recognition approaches alone — so, again, what chance do your students have?

How do our processes work, on the other hand? Really well. Here’s Check for Other Coverage, which has some debunks now but importantly shows that there is actually no USA Today article with this title (and has shown this since this was published).

And here’s Just Add Wikipedia which confirms there is no such “usatoday-go” URL associated with USA Today.

Both of these take significantly less time than judging the article’s surface features, and, importantly, result in relatively binary findings less prone to bias concerns. The story is not being covered in anything indexed by Google News. The URL is not a known USA Today URL. Match, set, point. Done.

Can they fail? Sure. But here’s the thing — they’ll actually fail less than more complex approaches, and when they do fail (for instance if the paper is not found in Wikipedia or does not have a URL) they still put you in good position for deeper study if you want it. Or, just maybe, if they don’t work in the first 30 seconds you’ll realize the retweet or news write up can wait a bit. The web is abundant with viral material, passing on one story that is not quickly verifiable won’t kill you.

Safety Culture and the Associated Press

More journalistic mess-ups in the news today, this time from the Associated Press, which labeled director/producer Costa-Gavras as dead when he is very much alive. Via Alexios Mantzarlis here’s a snapshot of the AP headline on the Washington Post from yesterday (I think?) of a hoax that happened almost a week ago.

DmX2YXkVAAE00j1 (1)

How did it happen? Well a person claiming to be the Greek Minister of Culture posted a tweet saying there was breaking news to this effect — here is the account as it looked at the time of the tweet:

zorba2

And here is the tweet:

zorba tweet

Twenty minutes later this tweet followed:

zorba3.PNG

And soon the handle and picture were changed:

Now in retrospect people always say such things are obvious fakes. Why is this in English, for example? Why was she using this weird badly lit press photo instead of either a personal photo or a government headshot?

But as I say over and over again, this is like changing lanes without checking your mirrors or doing a head check, getting in a collision, and then talking about all the reasons you “should have known the car was there.” Didn’t you think it odd no one had passed you? Didn’t you hear the slight noise of an engine?

None. Of. This. Matters.

I mean, it does matter. But the likely reason that you crashed is not that you didn’t apply your Sherlock powers of deduction. It’s that you didn’t take the 3 seconds to do what you should have.

Stop Reacting and Start Doing the Process

Last week I published an article that takes less than five minutes to read and shows how journalists and citizens can avoid such errors. I don’t mean to be egotistical here, but if you are a working reporter and haven’t read it you should read it.

Here’s an animated gif from that.

Verified

See a tweet. Think about retweeting. Look for verification checkmark. See what they are verified as. Three seconds.

Now what if they are not verified? Or if you’ve never heard of the publication they are verified for?

Well then you escalate. You start to pull out the the bigger toolsets: Wikipedia searches, Google News scans on the handle, the kind of thing Bari Weiss missed a while back that led her to publish quotes from a hoax Antifa account in The New York Times. Those ever so slightly more involved procedures look like this:

 

If that doesn’t work, you escalate further. Maybe get into more esoteric stuff: the Wayback Machine, follower analysis. Or if you are a reporter, you pick up the phone.

It was Sam Wineburg that did me the biggest service when I was writing my textbook for student fact-checkers. He talked about the need to relentlessly simplify (which I was already trying to do, though he pushed me harder). But he also told me who I should read: Gerd Gigerenzer. And that changed a lot.

Gigerenzer’s work deals with risk and uncertainty, and how different professional cultures deal with these issues. What he finds is that successful cultures figure out what the acceptable level of risk is, then design procedures and teach heuristics that take big chunks out of that risk profile while remaining relatively simple. Airline safety culture is a good example of this. How much fuel do you put in a plane? There’s a set of easily quantifiable variables:

  1. fuel for the trip
  2. plus a five percent contingency
  3. plus fuel for landing at an alternate airport if required
  4. plus 30 minutes holding fuel
  5. plus a crew-determined buffer for extreme circumstances

(via Gigerenzer, 2014)

That last one is a buffer, but you’ll notice something with the other ones: they don’t really achieve precision: if you were to calculate the fuel needs of each flight you could probably get closer to the proper fuel amount. I know, for example, that your chances of needing 30 minutes holding fuel for landing at Portland (PDX) are probably dramatically lower than your need for it when heading to Philadelphia (PHL). And the five percent contingency is supposed to make up for miscalculations in things like windspeed, but doesn’t account for the fact that different seasons and different flight paths have different levels of weather uncertainty.

The problem is that you can add those things back in, but now you’re reintroducing complexity back into the mix, and complexity pushes people back into error and bias.

That crew-determined buffer is important too, of course. If after all the other factors are accounted for the fuel amount doesn’t seem sufficient due to other factors, the crew can step up and add fuel (but importantly, not subtract). But the rule of thumb doesn’t waste their energy on the basic fuel calculation — they save it for dealing with the exceptions, the weird things that don’t fit the general model, where nuance and expertise matters.

This is a long detour, but the point is that rather than asking people to use individual thinking about dozens of factors around an issue weighted with careful precision, what Gigereezer calls “positive risk” cultures do is decide the acceptable level of risk for a given endeavor, then work together to design simple procedures and heuristics that if followed encode the best insights of the field when applied within the domain. At the end of the procedures there’s the buffer — you’re free to look at the result and think “the heuristic just doesn’t apply here.” But you have to explain yourself, and what’s different here. You apply the heuristic first and think second.

Does the AP Have a Digital Verification Process?

There’s another important piece about defining set processes: they can be enforced in a way that a general “be careful” can’t. We can start to enforce them as norms.

What is the AP process to source a tweet? Ideally, it would be some sort of short tree process — look for a blue checkmark. If found, do X, if not, do Y. The process would quickly sift out the vast majority of safe calls (positive or negative) in seconds.

That quick sifting is important, but just as important is the accountability it provides. Instead of looking at an error like this and discussing whether it was an acceptable level of error, we can start with the question “Was the process followed?” The nature of risk — as Gigerenzer reminds us — is that if you make no errors your system is broken, because you are sacrificing opportunity. So we shouldn’t be punishing people that just happen to be caught on the wrong end of a desirable fail rate.

But if a reporter risked the reputation of your organization because they didn’t follow a defined 5-second protocol — well that’s different. That should have consequences. These sorts of protocols exist elsewhere in journalism. Journalists aren’t accountable for lies sources tell them, but they are accountable for not following proper procedure around confirming veracity, seeking rebuttals, and pushing on source motivation.

Again, this isn’t meant to treat the heuristics or procedures as hard and fast laws. Occasionally procedures produce results so absurd you have to throw the rule book out for a bit. Experts do develop intuitions that sometimes outperform procedures. But the rule of thumb has to be the starting point, and expertise has to make a strong argument against applying it (or for applying a competing rule of thumb). And such “expert” deviations are clearly not what we are seeing here.

What Zeynep Said and Other Things

This is a wandering post, and it’s kind of meant to be — jotting down ideas I’ve been talking about here a while but haven’t pooled together (see here for one of my earlier attempts to bring Gigerenzer into it).

But there’s so much more and I have to post, and not all of it is about journalism. Some of it, unsurprisingly, is about education.

A couple threads I’ll just tack on here.

First, I have been meaning to write a post on Zeynep Tufekci’s NYT op-ed on Musk’s Thailand cave idiocy. Here’s the lead-up:

The Silicon Valley model for doing things is a mix of can-do optimism, a faith that expertise in one domain can be transferred seamlessly to another and a preference for rapid, flashy, high-profile action. But what got the kids and their coach out of the cave was a different model: a slower, more methodical, more narrowly specialized approach to problems, one that has turned many risky enterprises into safe endeavors — commercial airline travel, for example, or rock climbing, both of which have extensive protocols and safety procedures that have taken years to develop.

And here’s the killer graf:

This “safety culture” model is neither stilted nor uncreative. On the contrary, deep expertise, lengthy training and the ability to learn from experience (and to incorporate the lessons of those experiences into future practices) is a valuable form of ingenuity.

Zeynep is exactly right here, but I’d argue that while academia has escaped some of the worst beliefs of Silicon Valley we still often worship this idea of domain independent critical thinking. And part of it is because we devalue more narrowly contextualized knowledge. And a big part of it is that we devalue relevant *professional* knowledge as insufficiently abstract or insufficiently precise. We laugh at rules of thumb as fine for the proles but not for us with our peer-reviewed level of certainty.

But, as Zeynep argues, the process of looking at what competent people do and then encoding that expert knowledge into processes that novices can use is not only a deeply creative process itself, but it forms the foundation on which our students will practice their own creativity. And if we could get away from the idea of our students as professors in training for a bit, we could see that maybe?

Also  — I’ll talk about later is how rules of thumb relate to getting past the idea that we seek certainty. We don’t. We seek a certain level of risk, and the two things are very different. Thinking in terms of rules of thumb and 10 second protocols not only protects against error,  but it prevents the conspiratorial and cynical spiral that asking for academic levels of precision from novices can produce.

And finally — that “norm” thing about journalists? It’s true for your Mom too. Telling your Mom she’s posting falsehoods is probably not nearly as effective as telling her the expectation is she does a 30 second verification process before posting. When norms are clear (don’t cut in line) they are more enforceable than when they are vague (don’t be a dick at the grocery store). Part of the reason for teaching short techniques instead of more fuzzy “thinking about texts” is expecting a source lookup on a tweet is socially enforceable in a way a ten point list of things to think about is not.

OK, sorry for the mess of a post. Usually after I write a trainwreck of a thing like this I find a more focused way to say it later. But thanks for sticking with this until the end. 🙂

 

 

A Roll-Up of Digipo Resources (4 September 2018)

One of the nice things about running a blog-fueled grassroots semi-funded initiative is the agility. The Digipo project has moved far and fast in the past year. But one of the bad things is all the old blogposts a just a snapshot in time, and often out of date.

I’ve wanted to get everything updated and I will, but for the moment here’s a bunch of resources. Please note that if it is 2019 when you are reading this you should look for a more recent post.

Textbook

People love Web Literacy for Student Fact-checkers, and it continues to be the resource in broadest use.

Prompts for Class: Four Moves Blog

The way I run my classes is to throw up prompts and have the students race to learn more about them in short frames of time. Sometimes we move onto the next one, and sometimes we have deeper discussions about disinformation or structural factors after that (this details the format).

Anyway, key to that class structure is the Four Moves Blog which provides prompts for students to investigate. I just tell them to search for the prompt in the search box up top, and then investigate it. This avoids all the “What was that URL” again awkwardness while also allowing a certain class fluency in that we can react to what is working in the class rather than structure everything meticulously beforehand.

weed.PNG\

 

Slides and Lesson Plans for First Two Classes

While we play the classes after the first week a bit looser, the first two classes are pretty scripted. This is partially because we want to lay the right foundation, but also because we want to introduce the ideas without bumping too much up the potential identity threat issues this stuff can cause. So the first week deals with some serious stuff mixed with some frivolous stuff.

Here are the slides: Class One, Class Two

Here is the Lesson Plan/Notes: Google Doc

Note that the notes are a little out of sync with the slides in some places. The notes are mainly there though so you understand how to use the slides and the stories behind the examples, it’s not really a script.

The Canvas Course and the Blackboard Export

We have two to three weeks of online homework (activities, assessments, readings, videos) in Blackboard/Canvas.

If you look in Canvas Commons for a Citizen Fact-Checking module from me you should be able to import it into your class. I’ve heard some people have had some problems with that import and for others it’s gone well. Get back to me if you have problems and we’ll try to figure out what’s going on.

The Blackboard course export is here: Digipo-04-Sept-18. As usual, this is just my export of my materials. There aren’t any warranties, and you should go through the material after import, prune it and review it for error.

 

 

The “Always Check” Approach to Online Literacy

One of the things I’ve been trying to convince people for the past year and a half is that the only viable literacy solution to web misinformation involves always checking any information in your stream that you find interesting, emotion-producing, or shareable. It’s not enough to check the stuff that is suspicious: if you apply your investigations selectively, you’ve already lost the battle.

Once you accept that, certain things become clear. Your methods of checking have to be really quick. They have to be habitual, automatic. They can’t be cognitively expensive. And those who teach media literacy have to be conscious of this trade-off between depth and efficacy and act accordingly.

What do I mean by that? Let’s use an analogy: which technique do you think would prevent more car accidents?

  • A three-second check every time you switch lanes
  • A twenty-second check executed every time you think a car might be there

There are some hard problems with misinformation on the web. But for the average user, a lot of what goes wrong comes down to failure to follow simple and quick processes of verification and contextualization. Not after you start thinking, but before you do.

I can’t get these processes down to a two second mirror-and-head-check, but I can get them close. What follows are some of the methods we teach students in our work. It will seem like there is a lot of stuff to learn here, but you’ll notice that it comes down to the same strategies repeated in different contexts. This repetition is a feature, not a bug.

Is This the Right Site?

Today’s news reveals that Russian-connected entities were trying to spoof sites like the Hudson Institute for possible spear-phishing campaigns. How do we know if the Hudson Institute site we are on is really the real site? Here’s our check:

Hudson

The steps:

  • Go up to the “omnibar”
  • Strip off everything after the domain name, type wikipedia and press enter
  • This generates a Google search for that URL with the Wikipedia page at the top
  • Click that link, then check in the sidebar that the URL matches.
  • Forty-nine out of fifty times it will. The fiftieth time you may have some work to do.

In this case, the URL does match. What does this look like if the site is fake? Here’s an example. A while back a site at bloomberg.ma impersonated the Bloomberg News site. Let’s see what that would look like:

Bloomberg

You do the same steps. In this case Bloomberg News is not the top result, but you scroll down and click the Bloomberg News link, and check the URL and find it is different. If you’re lazy (which I am) you might click that link to get to the real site.

What Is the Nature of This Site?

Let’s stick with the Wikipedia technique for a moment, because it’s useful for a few other questions. As an example, let’s take one that got past both a Washington Post reporter and the WaPo fact-checkers a month or so ago. Question: Is this article really by the lead singer of Green Day?

greenday

Let’s check:

Clickhole

Again, same process. Now does this mean that you are 100% sure that it’s not Billie Joe that wrote that article? No — there’s a slight slight chance that maybe somehow the lead singer of Green Day wrote a —

Nah, you know what? It’s not him. Or if it is, the chances are so infinitesimal it’s not worth spending any more time on it. Find another source.

How about this site, and its searing commentary on Antifa and journalists?

antifascist.PNG

Maybe you agree with this article. I don’t, but maybe you do. And that’s okay. But do you want to share from this particular site to your friends and family and co-workers? Let’s take a look!

webamren

You can dig into this if you want, and look through the numerous links in that Wikipedia page that support this description. Maybe have a little mini-forum in your head about the differences between white nationalism and white supremacy.

Or maybe — here’s a thought — find a similar article from some other site that hasn’t been called a white supremacist organization by half a dozen mainstream groups. Because no matter what you think of the article, funneling friends and family to a site that has published such sentences as “When blacks are left entirely to their own devices, Western civilization — any kind of civilization — disappears” is not ethical — or likely to put you in the best light.

Is This Breaking News Correct?

Here’s some breaking news.

breaking

More people than you would think believe that the blue checkmark = trustworthy. But all the blue checkmark really does is say that the person is who they say they are, that they are the person of that name and not an imposter.

Your two-second “mirror and head-check” here is going to be to always, always hover, and see what they are verified for. In this case the verification means something: this person works for CNBC.com, a legitimate news site, and she covers a relevant beat here (the White House):

Verified

But maybe you don’t know CNBC, or maybe you see this news from someone not verified, or verified but not as a reporter. How will you know whether to share this? Because you know you’re DYING to share it and you can’t wait much longer

Use our “check for other coverage” technique:

Manafort

When a story is truly breaking, this is what it looks like. Our technique here is simple.

  • Select some relevant text.
  • Right-click or Cmd-click to search Google
  • When you get to Google don’t stop, click the “News” tab to get a more curated feed
  • Read and scan. Investigate more as necessary.

Scan the stories. If you want to be hypervigilant, scan for sources you recognize, and consider sharing one of the stories featuring original reporting instead of the tweet.

I’m going to state this again, but if you look at that loop above you’ll see this is about a seven second operation. You can absolutely do this every time before you share. And given it is so easy, it’s irresponsible not to. I’m not going to tell you you are a bad person if you don’t do these checks, but I think in your heart you already know.

Teach This Stuff First Already

Maybe you think you do this, or you can really “recognize” what’s fake by looking at it. I am here to tell you that statistically it’s far more likely you’re fooling yourself.

If you’re a human being reading this on the internet and if you’re not a time traveler from some future, better world, there is less than a one in a hundred chance you do the sort of checks we’re showing regularly. And if you do do this regularly — and not just for the stuff that feels fishy — then my guesstimate is you’re about two to three standard devs out from the mean.

Now imagine a world where checking your mirrors before switching lanes was rare, three standard-deviations-out behavior. What would the roads look like?

Well, it’d probably look like the Mad Max-like smoking heap of collisions, car fires, and carnage that is our modern web.

I get worried sometimes that I am going to become too identified with these “tricks”. I mean, I have a rich history of teaching students digital literacies that predates this work. I’ve been doing the broader work intensively for ten years. (Here’s a short rant of mine from 2009 talking about web literacy pedagogy.) I’ve read voraciously on these subjects and can talk about anything from digital redlining to polarization models to the illusory truth effect. I’m working on a project that looks to document the history of newspapers on Wikipedia. I worked on wiki with Ward Cunningham. I ran my first “students publish on the web” project in 1997.

But I end up coming back to this simple stuff because I can’t shake the feeling that digital literacy needs to start with the mirror and head-checks before it gets to automotive repair or controlled skids. Because it is these simple behaviors, applied as habitand enforced as norms, that have the power to change the web as we know it, to break our cycle of reaction and recognition, and ultimately to get even our deeper investigations off to a better start.

I have underlying principles I can detail, domain knowledge I think is important, issues around identity and intervention we can talk about. Deeper strategies for the advanced. Tips to prevent a fragility of process. Thoughts about the relationship between critical thinking and cynicism.

But for the love of God, let’s start with the head check.

 

 

 

QAnon and Pinterest Is Just the Beginning

I have been talking about Pinterest as a disinformation platform for a long time, so this article on QAnon memes on Pinterest is not surprising at all:

Many of those users also pinned QAnon memes. The net effect is a community of middle-aged women, some with hundreds of followers, pinning style tips and parfait recipes alongside QAnon-inspired photoshops of Clinton aide John Podesta drinking a child’s blood. The Pinterest page for a San Francisco-based jewelry maker sells QAnon earrings alongside “best dad in the galaxy” money clips.

Pinterest’s algorithm automatically suggests tags with “ideas you might love,” based on the board currently being viewed. In a timely clash of Trumpist language and Pinterest-style relatable content, board that hosts the Podesta photoshop suggests viewers check out tags for “fake news” and “so true.”

The story is a bit more complex than that, of course. It’s not clear to me that the users noted here are not spammers (as we’ll see below). It’s quite possible many of these accounts are people mixing memes and merchandise as a marketing amplification strategy. We don’t know anything about real reach, either. There are no good numbers on this.

But the threat is real, because Pinterest’s recommendation engine is particularly prone to sucking users down conspiracy holes. Why? As far as I can tell, it’s a couple of things. The first problem is that Pinterest’s business model is in providing very niche and personalized content. It’s algorithm is designed to recognize stuff at the level of “I like pictures of salad in canning jars”, and as Zeynep Tufekci has demonstrated with YouTube, engines of personalization are also engines of radicalization.

But it’s more than that: it’s how it goes about recommendation. The worst piece of this, from a vulnerability perspective, is that it uses “boards” as a way to build its model of related things to push to you, and that spammers have developed ways to game these boards that both amplify radicalizing material and and provide a model for other bad actors to emulate.

How Spammers Use Pinterest Boards as Chumbuckets

The best explanation of how this works comes from Amy Collier at Middlebury,  whose post on Pinterest radicalization earlier this year is a must-read for those new to the issue. Drawing on earlier work on Pinterest manipulation, Collier walks through the almost assuredly fake account of  Sandra Whyte, a user who uses boards with extreme political material to catch the attention of users. Here’s her “American Politics” board:

Screen-Shot-2018-03-13-at-10.28.08-AM.png

These pins flow to other users’ home pages with no context, which is why the political incoherence of the board as a whole is not a problem for the user. People are more likely to see the pins through the feed than the board as a whole.

Once other users like that material, they are more likely to see links to TeeSpring T-shirts this user is likely selling:

Screen-Shot-2018-03-13-at-10.24.39-AM.png

The T-Shirts are print-on-demand through a third-party service, so hastily designed that the description can’t even be bothered to spell “Mother” right.

teespring

So two things happen here. When Moms like QAnon content, they get t-shirts, which provides the incentive for spammers to continue to make these boards capitalizing on inflammatory content. Interestingly, when Moms like the T-shirts, they get QAnon content. Fun, right?

How Pinterest’s Aggressive Recommendation Engine Makes This Worse

About a year ago I wrote an article on how Pinterest’s recommendation engine makes this situation far worse.  I showed how after just 14 minutes of browsing, a new user with some questions about vaccines could move from pins on “How to Make the Perfect Egg” to something out of the Infowarverse:

after.png

What was remarkable about this process was that we got from point A to B by only pinning two pins on a board called vaccination.

I sped up the 14 minute process into a two and a half minute explanatory video. I urge you to watch it, because no matter how cynical you are it will shock you.

I haven’t repeated this experiment since then, so I’m unable to comment on whether Pinterest has mitigated this in the past year. It’s something we should be asking them, however.

I should note as well that the UI-driven decontextualization that drove Facebook’s news crisis is actually worse here. Looking at a board, I have no idea why I am seeing these various bits of information at all, or any indication where they come from.

pinterest

Facebook minimized provenance in the UI to disastrous results. Pinterest has completely stripped it. What could go wrong?

Pinterest Is a Major Platform and It’s Time to Talk About It That Way

Pinterest has only 175 million users, but 75 million of those users are in the United States. We can assume a number of spam accounts pad that number, but even accounting for that, this is still a major platform that may be reaching up to a fifth of the U. S. population.

So why don’t we talk about it? My guess is that its perceived as a woman’s platform, which means the legions of men in tech reporting ignore it. And the Silicon Valley philosopher-king class doesn’t bring it up either. It just sounds a bit girly, you know? Housewife-ish.

This then filters down to the general public. When I’ve talked about Pinterest’s vulnerability to disinformation,  the most common response is to assume I am  joking. Pinterest? Balsamic lamb chops and state-sponsored disinfo? White supremacy and summer spritzers?

Yup, I say.

I don’t know how compromised Pinterest is at this point. But everything I’ve seen indicates its structure makes it uniquely vulnerable to manipulation. I’d beg journalists to start including it in their beat, and researchers to throw more resources into its study.

A Provocation for the Open Pedagogy Community

Dave Winer has a great post today on the closing of blogs.harvard.edu. These are sites run by Berkman, some dating back to 2003, which are being shut down.

My galaxy brain goes towards the idea of federation, of course. The idea that everything referencing something should store a copy of what it references connected by unique global identifiers (if permissions and author preferences permit), and that we need a web that makes as many copies of things as the print world did, otherwise old copies of the Tuscaloosa News will outlast anything you are reading today on a screen. Profligate copying, as Ward Cunningham has pointed out, is biology’s survival strategy and it should be ours as well.

(I know, nature is not teleological. It’s a metaphor.)

But my smaller provocation, perfectly engineered for Friday twitter outrage at me and my sellout-ness, is this:

All my former university hosted sites are gone. We built up a WPMU instance at Keene in 2010, and the lack of broad adoption meant when I left in 2013 we shut it down. I ran some wiki on university servers here and at Keene, and those are gone too.

All my self-hosted sites are corrupted from hacks or transfer errors in imports. Go back into this blog and you’ll find sparse posting schedule for some years between 2010 and 2012 and it’s because those posts got nuked in a 2012 hack. I had to go out to the Wayback Machine and reconstruct the important ones by hand.

Transfer errors, let me tell you:  Go back to 2007 and look at all the images that failed imports and moves on this blog when it was self hosted. There’s also this weird “” character that pops up in all of them like this:

Hold on, you say, these Metro signs look different! There’s no BRAND!

The entire Blue Hampshire community I co-founded, over 15,000 posts and 100,000 comments, originally self-hosted on SoapBlox and then WordPress? Gone. It’s probably OK, I said a lot of stupid stuff. But of course it was also a historically important site, one of the most successful state political blogging communities, one of the first communities to be syndicated by Newsweek, one of the first to feature news stories that cross-posted — as news stories — to Huffington Post. One of the first sites to get individual statements from all the Democratic presidential candidates in a weekly forum. Gone, gone, gone.

I know, this doesn’t seem to be provocative, but here’s the thing:

My Blogger sites from 2005 forward? They’re up and they are pristine.

meh

I mean, I’m not sure that’s a great thing — it was where I put little experiments too little to be worth setting up another BlueHost domain. But it also did me a solid in Keene Scene, where the 12-year old images of Keene life have stayed up unmolested and without any maintenance. (I’d quite forgotten about it, really).

ice

Same holds — as I’ve mentioned before — for projects students put up on Google Sites. The BlueHost server (and later the Rackspace account) was long ago shut down but Google Sites is still up.

I’m not making a specific case here. But I do want to point out a big reason I moved to self-hosted and institutional solutions was this idea that commercially hosted stuff was too fickle. In 2006, it seemed that every week a new site shut down. For better or worse (mostly worse) monopoly consolidation has changed that dynamic a bit. There are other good reasons for self-hosting or doing institutional hosting, but durability is more downside than upside of these options, and we might want to let our students know that if they want something to stay up, self-hosting may not be the best choice.

Newspapers On Wikipedia Update: Initial Wikidata Pass

Thanks to initial work by folks at Wellesley and Wikidata work from 9of99 on Wikipedia, the Newspapers on Wikipedia project has both created an initial Wikidata set of extant U.S. Newspapers and mapped that to needs for page and infobox creation.

The full set is here and can be queried in multiple ways:

http://tinyurl.com/yb6sng9e

Visually these maps overstate needs in high density areas, since the red dots (needs page) take precedent over blue dots (has page) in a conflict, and the data has a geolocation that is only as granular as the town (hence Chicago has one geolocation). And the data will need continued cleanup — I’ve spotted a few issues just screenshotting regions. But this initial data set will be developed alongside the rest of the project, and even when papers don’t make it into Wikipedia, we’ll make sure the Wikidata on them is accurate, and try to match them with other sets of data as we go forward.

According to the data here (which again, is imperfect) the current counts are:

  • Has Wikipedia page and Infobox: 957
  • Needs Infobox: 84
  • Needs Page: 3775

(We’ve already put a dent in some of the work before this, so we’ll go back and manually tally up a baseline.)

Anyway, some maps. Keep in mind this is very preliminary.