Plagiarism Derp Reaches Epic Levels

First there was Buzzfeed, which admittedly plagiarized material:

Take that “Faith in Humanity” write-up. Last September,—“the self-anointed curator of the Internet,” a kind of poor man’s BuzzFeed—posted an item called, “7 Pictures That Will Restore Your Faith in Humanity.” Then, last month, NedHardy posted another piece, “13 Pictures To Help You Restore Your Faith in Humanity.” Half of the photos in BuzzFeed’s post appear in NedHardy’s two compilations. NedHardy isn’t mentioned anywhere in BuzzFeed’s “21 Pictures” post.

Then the derp began to grow. Rick Perlstein, author of a new Reagan biography, has been accused of plagiarism in what seems to be a political tactic:

In the letters, Shirley [a longtime political operative] claims that Perlstein lifted “without attribution” passages from “Reagan’s Revolution,” and substantially ripped off his work even when attributing. He demands that all copies of “The Invisible Bridge” be destroyed, with an additional request of a public apology and $25 million in damages.

In the letters, Shirley claims that Perlstein lifted “without attribution” passages from “Reagan’s Revolution,” and substantially ripped off his work even when attributing. He demands that all copies of “The Invisible Bridge” be destroyed, with an additional request of a public apology and $25 million in damages.

Rick Perlstein would have to be the worst plagiarist in history, by citing his victim 125 times in source notes and thanking him in the acknowledgments.

And then there’s Newsweek editor Fareed Zakaria, who has been accused by bloggers of passage rip-offs like this:

Or take this example, where Zakaria happens to write 11 words in the exact same order as they appeared in a Peter Beinart article.

This is insane. Let’s start with the Buzzfeed example. Certainly Buzzfeed did build off the work of Ned Hardy without attribution. Just as Ned Hardy posted photos he had seen elsewhere without hat-tipping those who had found them. Just as he took Buzzfeed’s famous formula of “X pictures that Y” and put it to use on his site.

The Reagan example is a bit of an odd case, but speaks to the dangers of this road. The Zakaria example borders on parody.

What is it that we’re arguing here? That Zakaria should spend time rewriting a sentence like “In 2009, Senate Republicans filibustered a stunning 80% of major legislation.”? For what purpose? What if that is the most obvious way to say it, and other formulations just subtract from the impact?

What do we expect would happen if Zakaria cited Beinart for this sentence? What damage has occurred to Beinart as a result of Zakaria not citing it? Was there a legion of Zakaria fans who would have said — “Wow, that sentence from Beinart is brilliant — I need to read more Beinart!”

These things seem small, but they are not. Much (if not most) of our daily work flows written descriptions, curation of resources, and other recomposition of texts. Developing a culture that allowed for fluid reuse of the work of others would free up our capacity to solve problems instead of wasting time rearranging clauses. We are held back from fluid reuse by cultural conventions which force us to see wholesale copying of unique insights and pedestrian descriptions of Senate procedure as the same thing. We are held back by technologies that have not moved past cut-and-paste models of reuse. We are held back by the plagiarism police who demand that our attributions be placed in ways that break the flow of reading, or send users to source websites only to find the source was linked for trivial reasons.

Some people need to make a living off of words, and the reputation generated by their words. We need to preserve that. But we also need to radically rethink plagiarism if we are going to take advantage of the ability the web gives us to build off of the work of others. And we seem to be going in the opposite direction.

Open Licences and SFW

David Wiley with a great comment on yesterday’s post:


The answer, more or less, is yes. And initially that seems like a dealbreaker. 

But here’s the history of the web, from me, condensed.

A long time ago very smart people decided that web pages had to all look different, that your stuff would only exist on your site and people had to link to your page as their way of reusing/quoting your stuff, rather than copying it to their own site. And we built a whole web around this idea that everybody would have different looking sites that contained only their content, everything would exist in exactly one place, and copyright would all keep us nice and safe. And every single one of these decisions made reusing and remixing a huge pain in the butt. But it was what we wanted, right?

Today most web activity happens on Facebook, Twitter, Tumblr, and Pinterest, and the way it works is that other people repost your stuff on *their* page, and everybody’s pages look the same, and people more or less like that because it makes resharing and reblogging and giving credit easy. So the web is more or less like Smallest Federated Wiki now, with the exception that instead of you having an open license, Facebook, Tumblr, Twitter, and Pinboard own your stuff, and none of them talk to one another.

So yes, it requires open licensing, But it’s honestly the system we’re at today, just refactored to account for what people actually ended up wanting. It builds the idea of “reuse, revise, reply, and reshare in your own space” into the core of the system so that you don’t need a third party site to make that happen.

The Web is Broken and We Should Fix It

Via @roundtrip, this conversation from July:web

There’s actually a pretty simple alternative to the current web. In federated wiki, when you find a page you like, you curate it to your own server (which may even be running on your laptop). That forms part of a named-content system, and if later that page disappears at the source, the system can find dozens of curated copies across the web. Your curation of a page guarantees the survival of the page. The named-content scheme guarantees it will be findable. 

It also addresses scalability problems. Instead of linking you to someone’s page (and helping bring down their server) I curate it. You see me curate it and read my copy of that page. The page ripples through the system and the load is automagically dispersed throughout the system.

It’s interesting that Andreessen can’t see the solution, but perhaps expected. Towards the end of a presentation I gave Tuesday with Ward Cunningham about federated content, Ward got into a righteous rant about the “Tyrrany of Paper”. And the idea he was digging at was this model of a web page as a printed publication had caused us to ignore the unique affordances of digital content. We can iteratively publish, for example, and publish very unfinished sorts of things. We can treat content like data, and mash it up in new and exciting ways. We can break documents into smaller bits, and allow multiple paths through them. We can rethink what authroship looks like. 

Or we can take the Andreessen path, which as Ted Nelson said in his moving but horribly misunderstood tribute to Doug Englebart, is “the costume party of fonts that swept aside [Englebart's] ideas of structure and collaboration.” 

The two visions are not compatible, and interestingly it’s Andreessen’s work which locked us into the later vision. Your web browser requests one page at a time, and the layout features of MOSAIC>Netscape guarantee that you will see that page as the server has determined. The model is not one of data — eternally fluid, to be manipulated like Englebart’s grocery list — but of the printed page, permanently fixed.

And ultimately this gives us the server-centric version of the web that we take for granted, like fish in water. The server containing the data — Facebook or Blogger, but also WordPress — controls the presentation of the data, controls what you can do with it. It’s also the One True Place the page shall live — until it disappears. We’re left with RSS hacks and a bewildering array of API calls to accomplish the simplest mashups. And that’s because we know that the author gets to control the printed page — its fonts, its layout, its delivery, its location, its future uses.

The Tyrrany of Print led to us gettting pages delivered as Dead Data, which led to the server-centric vision we now have of the web. The server-centric vision led to a world that looked less like BitTorrent and more like Facebook. There’s an easy way out, but I doubt anyone in Silicon Valley wants to take it.


Ward Cunningham’s explanation of federation (scheme on right) — one client can mash together products of many servers. Federation puts the client, not the server, in control. 

The Part of Wiki Culture the Classroom Forgot

If you look at most treatments of wiki in the classroom, people talk about collaboration, group projects, easy publishing, revision control. All of these are important. But one important element of what makes a wiki a wiki has been underutilized.

Wikis not only introduced the editable page to users, but the idea of page-creating links. (In fact, this invention pre-dates wiki and even the web, having been first pioneered in the Hypercard implementation Ward Cunningham wrote for documenting software patterns).

Page-creating links are every bit as radical as the user-edited page — perhaps even more so. What page-creating links allow you to do, according to Cunningham, is map out the edges of your knowledge — the places you need to connect or fill in. You write a page (or a card) and you look at it and ask — what on this page needs explanation? What connections can we make? Then you link to resources that don’t exist yet. Clicking on those links gives you not an error, but an invitation to create that page. The new page contains both links back to concepts you’ve already documented, but also generates new links to uncreated resources. In this way the document “pushes out from the center” with each step both linking back to old knowledge and identifying new gaps.

In the video below I show this “pushing out from the center”  process on a wiki of my own and talk about how this architecture and process relates to intergrative learning. For best viewing, hit HD button and make full screen.

Using Wiki for Connected, Integrative Learning from Mike Caulfield on Vimeo.

Blue Hampshire’s Death Spiral

Blue Hampshire, a political community I gave years of my life to, is in a death spiral. The front page is a ghost town.

It’s so depressing, I won’t even link to it. It’s so depressing, that I haven’t been able to talk about it until now. It actually hurts that much.

This is a site that at the point I left it had 5,000 members, 10,000 posts, and 100,000 comments. And at the point co-founders Laura Clawson and Dean Barker left it circa 2011(?), it had even more than that.

And what comments! Because I say that *I* put sweat into it, or Laura and Dean did, but it was the community on that site that really shone.  Someone would put up a simple post, and the comments would capture history, process, policy, backstory — whatever. Check out these comments on a randomly selected post from 2007.

The post concerns an event where the local paleoconservative paper endorsed John McCain for their Democratic candidate, as a way to slight a strong field of Democrats in 2008.

What happens next is amazing, but it was the sort of thing that happened all the time on Blue Hampshire. Sure, people gripe, but they do so while giving out hidden pieces of history and background that just didn’t exist anywhere else on the web. They relate personal conversations with previous candidates, document the history the paper has of name-calling and concern-trolling.

Honest to God, this is one article, selected at random from December 2007 (admittedly, one of our top months). In December 2007, our members produced 426 articles like this. Not comments, mind you. Articles. And on so many of those articles, the comments read just like this — or better.

That’s the power of the stream, the conversational, news-peg driven way to run a community. Reddit, Daily Kos, TreeHugger, what have you.

But it’s also the tragedy of the stream, not only because sites die, but because this information doesn’t exist in any form of much use to an outsider. We’re left with the 10,000 page transcript of dead conversations that contain incredible information ungrokable to most people not there.

And honestly, this is not just a problem that affects sites in the death spiral or sites that were run as communities rather than individual blogs. The group of bloggers formerly known as the edupunks have been carrying on conversations about online learning for a decade now. There’s amazing stuff in there, such as this recent how-to post from Alan Levine, or this post on Networked Study from Jim. But when I teach students this stuff or send links to faculty I’m struck by how surprisingly difficult it is for a new person to jump into that stream and make sense of it. You’re either in the stream or out of it, toe-dipping is not allowed.

And so I’m conflicted. One of the big lessons of the past 10 years is how powerful this stream mode of doing things is. It elicits facts, know-how, and insights that would otherwise remain unstated.

But the same community that produces those effects can often lock out outsiders, and leaves behind indecipherable artifacts.

Does anyone else feel this? That the conversational mode while powerful is also lossy over time?

I’m not saying that the stream is bad, mind you — heck, it’s been my way of thinking about every problem since 2006. I’m pushing this thought out to all you via the stream. But working in wiki lately, I’ve started to wonder if we’ve lost a certain balance, and if we pay for that in ways hidden to us. Pay for our lack of recursion through these articles, pay for not doing the work to make all entry points feel scaffolded. If that’s true, then — well, almost EVERYTHING is stream now. So that could be a problem.




The OS-Based Lifestream Will Kill the Mega-Site, Continued

The OS-based Lifestream killed the mega-site.

The OS-based Lifestream killed the mega-site. And nobody seemed to notice…

Back in January, one of my predictions was that the “Revenge of the OS” would accelerate.

The idea was that Google and Apple didn’t need to compete with Facebook, because Google and Apple actually owned the one lifestream that mattered — the notifications panel of your smartphone. Facebook’s monopoly was in fact broken and  the age of the mega-site was over, because the point of integration had moved upstream, off the server and onto the device. Teens who jumped off Facebook had replaced it not with another service, but with their notifications panel, and the rest of us were doing the same. I predicted that Google realized this and would abandon their Google+ strategy, and fold their “portal/community” efforts into Android, with the panel as the lifestream.

Four months later Google reorganized, moving its Google+ Hangout team to the Android division.

Last week reports emerged that Google is now moving Google Photos out of Google+ as well.

In June, Apple announced one of the biggest new features in iOS 8 would be interactive notifications, that let users respond to notifications without opening the app.

And that’s not even the I-told-you-so yet.

Here’s the most interesting bit. A new app called WUT, supported by Google Ventures, lets you send anonymous messages to your Facebook friends. And, as a recent post on Medium points out, there is no app in this app. You interact with the app entirely through your notifications panel. Those Android and iOS panels have become the lifestream platform Facebook used to be, and Facebook is now a rather nice app that runs in those platforms.

This trend of identity moving upstream is huge, and I still see very few people grappling with it. We’re still fighting the last war.


All the Versions of the Amelia Bedelia Wikipedia Entry

I should probably stop talking about the federated aspects of Smallest Federated Wiki — as I mentioned before, whether federation works or not for any given use case is speculative. There is no way to come to real agreement about it since it relies on us making predictions about how people will act in a federated content environment, and no one really knows what that looks like at scale. In situations like this it’s probably best to just try to get people to use it rather than talk about models of use.

But Maha Bali makes some great points in the comments of the last post that I want to address. So I suppose I again risk the entire project by dragging people into its most contentious assertion ;)

Here’s Maha:

But then your point that wikipedia still has a place is a valid one: encyclopedias are not meant to provide the space for those diff perspectives, and wikipedia does have a discussion space for them, except not visible to the usual user.

I am just thinking of what it means to the “casual” web surfer (not the deeply versted researcher) to get confused by all the versions? There are already multiple websites for every web search we make, and we’re just usually using one search engine not a metacrawler or whatever.

First, let me agree with Maha that absolutely there is a place for Wikipedia in this equation. SFW is not meant to replace Wikipedia. But I want to address the “all the versions” critique.

Again, I think the issue here is we treat every piece of knowledge as if it were the Wikipedia page on the Civil War or Behaviorial Genetics. On those pages there’s a deep need for some consensus version that someone can read. But 99.999% of subjects are not like that. With a hat tip to Heavy Metal Umlaut, here are the 217 edits to Amelia Bedelia (more or less, speeding up the film may have dropped some edits).

Again, I can’t repeat enough the SFW is not a replacement for Wikipedia. But it’s really helpful to look at what *most* subjects look like, because our stress about “all the versions” is a classic case of over-engineering.

Of the 217 edits, most are vandalism and vandalism reversions. You’ll see that edits that flash by about Aspberger’s syndrome, “Booger Dystrophy” and the minimalistic “Dragons are cool”. You don’t see the war below the fold, but some of the edits are people adding fake Amelia Bedelia Movies below the book list (Sample title:  Amelia Bedelia in China (1897)). Other edits involve formating, italicization, small wording changes.

Filter out vandalism, style, and formatting edits, and there’s less than ten main components, really.

  1. The main stub. What Amelia Bedelia is, who wrote it. Some level of plot summary
  2. Some edits about other people involved with the work besides the author
  3. A section that exists for a while on how Amelia Bedelia teaches kids about polysemy, etc.
  4. The book list, with dates
  5. A section that exists for a while on the vaudeville/irish connection
  6. A section that appears and reappears about a statue of the author
  7. The cover of the book
  8. The link to the I Can Read Series
  9. The Cameroon hoax

What happens if someone gets linked to an old version? Flipping through the edits, it’s hard to see the harm. Someone goes to a page to get a question answered, presumably. Even the first version of the page will answer most questions that people want to ask. The biggest risk is someone might miss the book list, which seems of substantial use to the casual reader.

Flipping through the edits, and omitting the book list addition, it’s often hard to see how a newer version is particularly better than an older version.  In many cases, for many users, the newer version is worse or less useful (not even taking the Cameroon hoax into account).

This is not the story we tell ourselves about Wikipedia. But that’s because we know where Wikipedia really succeeds — it does an amzing job on the big pages, with a large vibrant community who engage with the page over a period of years, and make sure that items don’t fall between the cracks.

But if there’s one thing I’ve learned recently, again and again, it’s that much of the web is engineered for a scale that it doesn’t operate at. And to a certain extent that is true about Wikipedia. There’s no other Wikipedia, because Wikipedia works best at scale. It tends to not work as well with Amelia Bedelia-sized things.

So what the harm? As Maha mentions,  it’s all in the edits, right? If you want to dig? Not really. I sped the film up: it took me 12 minutes just to click through all the edits, primarily due to rampant vandalism. That doesn’t include reading them, diffing them, or anything of that nature. No one’s going to do that, short of writing a book on a subject.

And (primarly because of the vandalism issue) past versions are not exposed to Google.


So when that fact is gone, it’s gone. As an author, you give Wikipedia stewardship of your work, but you do in fact also give them the ability to erase it from the web.

Let’s replay history with SFW in the picture.

I’m looking at Amelia Bedelia one day, and I realize — wow, this looks a lot like anti-Irish vaudeville.

I look for an SFW page on Amelia Bedelia, I fork it to my site, and add my insight. If people like that insight they pull it back. The people who care most about scholarship around children’s books fork it back.

A person on Wikipedia finds my idea in a Google search, puts it on Wikipedia, it gets deleted eventually.

The vandalisms, including the Cameroon vandlaism, never happen, because in SFW the only house you can vandalize is your own.

You have a question about Amelia Bedelia. You do a Google search. You get Wikipedia, but you also get links to a number of SFW, ranked by a reputation algorithm. If Wikipedia answers your question, great. If not, you have these links.

You click into a SFW page on Amelia. Which one? From Google, the highest ranked one, which is probably the one most linked to, or the one one the site with other articles of high reputation on Children’s Literature, or whatever. You read that one.

If that doesn’t answer your question, you look up top and see there is a list of more newly updated ones. You pick the most recent one. In each case, you always have access to an author page which tells you why the curator of this particular page should be trusted.

At all stages of this, there’s never too many versions. There’s only what you are looking at, and additional versions if you need them.

The “too many versions” problem reminds me a lot of when AltaVista came out as a search engine, and suddenly people were very stressed about at all the search results for a term. There’s this stress that information is out there that we are not dealing with, but the reality is the way we deal with it is decide at each point whether it’s worth reading more.

My wife Nicole, for instance, would love to see lesson plans on SFW. She could jump to one on Van Gogh.

How would she know that it’s the “best” one? She wouldn’t. Largely because there isn’t a best one. But having the different versions visible allows her to quickly sort through them to get what she needs. Over time, the best one ends up being the one that everybody links to externally (e.g. tweeting “best van gogh lesson plan ever!”) or ends up forking to their own site.

Anyway, I should get off this topic — as I mentioned, I’m trying to steer away from the federated aspect of SFW because it’s the one point of contention in a product that has many other things going for it. But the answer, roughly, to how we deal with multiple versions is that we let networks and network algorithms sort them, versus use a “last edit/best edit” system.  For large complex articles with active communities this won’t be the best way to deal with things — last-edit/best-edit works great. But for most things we do it will in fact be better, and in fact help Wikipedia to be better as well.





Get every new post delivered to your Inbox.

Join 134 other followers