Copying a Whole Site Is Remarkably Easy In Smallest Federated Wiki

Operations in Smallest Federated Wiki tend to be page-level — dashboard style site managers have been avoided for the moment. Still, the speed at which operations can be executed makes site-wide stuff pretty easy. This video shows how to copy a small fifteen page site in about a minute.

If you think about how long it would take you to log into a dashboard interface, export a site, log into another dashboard interface, and upload the file to the import process — Smallest Federated Wiki compares favorably.

How is this speed achieved? First of all, moving the integration to the browser allows us to pull two sites together into a single interface. Importantly, neither site has to have any knowledge of the other before the drag, because to the browser a site is just another data source. It’s the difference between the two models below, with the federated model on the right.




Client-based integration is more amenable to fluid reuse because it can have a single integrated view of multiple sites in a way that server-based systems can not.

The second reason it’s so quick is the parallel pages structure. The multiple pages on the screen are less impressive looking than your average web page. But you pay a massive tax for that look in the form of the “click-forward, act, click-backward” actions you perform every single day. Here you see how much eliminating that speeds up interaction, as you click on a list that stays in place and then fork the pages without playing the “forward-back” game.

As a side note, having used SFW for a while, I now get frustrated in “normal” web interfaces that use the single-page model. It feels ridiculously kludgy. Forward and back in 2014? Are you kidding?

The third reason the operation flows well is the data-based nature of it. We’re not shipping layout to the new site, we are essentially copying the database record for that page. No formatting surprises to greet you after the copy operation.

So fine — this is fifteen pages. What if you wanted to fork a site of a hundred pages? Well, it’d probably take seven times this long, so maybe 10 minutes?

That’s ten minutes to fork a picture perfect copy of any SFW site in the world. I’m not even sure you can do that in GitHub in ten minutes.

(Are people beginning to get the power of these few small interface changes yet?)


Plagiarism Derp Reaches Epic Levels

First there was Buzzfeed, which admittedly plagiarized material:

Take that “Faith in Humanity” write-up. Last September,—“the self-anointed curator of the Internet,” a kind of poor man’s BuzzFeed—posted an item called, “7 Pictures That Will Restore Your Faith in Humanity.” Then, last month, NedHardy posted another piece, “13 Pictures To Help You Restore Your Faith in Humanity.” Half of the photos in BuzzFeed’s post appear in NedHardy’s two compilations. NedHardy isn’t mentioned anywhere in BuzzFeed’s “21 Pictures” post.

Then the derp began to grow. Rick Perlstein, author of a new Reagan biography, has been accused of plagiarism in what seems to be a political tactic:

In the letters, Shirley [a longtime political operative] claims that Perlstein lifted “without attribution” passages from “Reagan’s Revolution,” and substantially ripped off his work even when attributing. He demands that all copies of “The Invisible Bridge” be destroyed, with an additional request of a public apology and $25 million in damages.

In the letters, Shirley claims that Perlstein lifted “without attribution” passages from “Reagan’s Revolution,” and substantially ripped off his work even when attributing. He demands that all copies of “The Invisible Bridge” be destroyed, with an additional request of a public apology and $25 million in damages.

Rick Perlstein would have to be the worst plagiarist in history, by citing his victim 125 times in source notes and thanking him in the acknowledgments.

And then there’s Newsweek editor Fareed Zakaria, who has been accused by bloggers of passage rip-offs like this:

Or take this example, where Zakaria happens to write 11 words in the exact same order as they appeared in a Peter Beinart article.

This is insane. Let’s start with the Buzzfeed example. Certainly Buzzfeed did build off the work of Ned Hardy without attribution. Just as Ned Hardy posted photos he had seen elsewhere without hat-tipping those who had found them. Just as he took Buzzfeed’s famous formula of “X pictures that Y” and put it to use on his site.

The Reagan example is a bit of an odd case, but speaks to the dangers of this road. The Zakaria example borders on parody.

What is it that we’re arguing here? That Zakaria should spend time rewriting a sentence like “In 2009, Senate Republicans filibustered a stunning 80% of major legislation.”? For what purpose? What if that is the most obvious way to say it, and other formulations just subtract from the impact?

What do we expect would happen if Zakaria cited Beinart for this sentence? What damage has occurred to Beinart as a result of Zakaria not citing it? Was there a legion of Zakaria fans who would have said — “Wow, that sentence from Beinart is brilliant — I need to read more Beinart!”

These things seem small, but they are not. Much (if not most) of our daily work flows written descriptions, curation of resources, and other recomposition of texts. Developing a culture that allowed for fluid reuse of the work of others would free up our capacity to solve problems instead of wasting time rearranging clauses. We are held back from fluid reuse by cultural conventions which force us to see wholesale copying of unique insights and pedestrian descriptions of Senate procedure as the same thing. We are held back by technologies that have not moved past cut-and-paste models of reuse. We are held back by the plagiarism police who demand that our attributions be placed in ways that break the flow of reading, or send users to source websites only to find the source was linked for trivial reasons.

Some people need to make a living off of words, and the reputation generated by their words. We need to preserve that. But we also need to radically rethink plagiarism if we are going to take advantage of the ability the web gives us to build off of the work of others. And we seem to be going in the opposite direction.

Open Licences and SFW

David Wiley with a great comment on yesterday’s post:


The answer, more or less, is yes. And initially that seems like a dealbreaker. 

But here’s the history of the web, from me, condensed.

A long time ago very smart people decided that web pages had to all look different, that your stuff would only exist on your site and people had to link to your page as their way of reusing/quoting your stuff, rather than copying it to their own site. And we built a whole web around this idea that everybody would have different looking sites that contained only their content, everything would exist in exactly one place, and copyright would all keep us nice and safe. And every single one of these decisions made reusing and remixing a huge pain in the butt. But it was what we wanted, right?

Today most web activity happens on Facebook, Twitter, Tumblr, and Pinterest, and the way it works is that other people repost your stuff on *their* page, and everybody’s pages look the same, and people more or less like that because it makes resharing and reblogging and giving credit easy. So the web is more or less like Smallest Federated Wiki now, with the exception that instead of you having an open license, Facebook, Tumblr, Twitter, and Pinboard own your stuff, and none of them talk to one another.

So yes, it requires open licensing, But it’s honestly the system we’re at today, just refactored to account for what people actually ended up wanting. It builds the idea of “reuse, revise, reply, and reshare in your own space” into the core of the system so that you don’t need a third party site to make that happen.

The Web is Broken and We Should Fix It

Via @roundtrip, this conversation from July:web

There’s actually a pretty simple alternative to the current web. In federated wiki, when you find a page you like, you curate it to your own server (which may even be running on your laptop). That forms part of a named-content system, and if later that page disappears at the source, the system can find dozens of curated copies across the web. Your curation of a page guarantees the survival of the page. The named-content scheme guarantees it will be findable.

It also addresses scalability problems. Instead of linking you to someone’s page (and helping bring down their server) I curate it. You see me curate it and read my copy of that page. The page ripples through the system and the load is automagically dispersed throughout the system.

It’s interesting that Andreessen can’t see the solution, but perhaps expected. Towards the end of a presentation I gave Tuesday with Ward Cunningham about federated content, Ward got into a righteous rant about the “Tyrrany of Paper”. And the idea he was digging at was this model of a web page as a printed publication had caused us to ignore the unique affordances of digital content. We can iteratively publish, for example, and publish very unfinished sorts of things. We can treat content like data, and mash it up in new and exciting ways. We can break documents into smaller bits, and allow multiple paths through them. We can rethink what authroship looks like.

Or we can take the Andreessen path, which as Ted Nelson said in his moving but horribly misunderstood tribute to Doug Englebart, is “the costume party of fonts that swept aside [Englebart’s] ideas of structure and collaboration.”

The two visions are not compatible, and interestingly it’s Andreessen’s work which locked us into the later vision. Your web browser requests one page at a time, and the layout features of MOSAIC>Netscape guarantee that you will see that page as the server has determined. The model is not one of data — eternally fluid, to be manipulated like Englebart’s grocery list — but of the printed page, permanently fixed.

And ultimately this gives us the server-centric version of the web that we take for granted, like fish in water. The server containing the data — Facebook or Blogger, but also WordPress — controls the presentation of the data, controls what you can do with it. It’s also the One True Place the page shall live — until it disappears. We’re left with RSS hacks and a bewildering array of API calls to accomplish the simplest mashups. And that’s because we know that the author gets to control the printed page — its fonts, its layout, its delivery, its location, its future uses.

The Tyrrany of Print led to us gettting pages delivered as Dead Data, which led to the server-centric vision we now have of the web. The server-centric vision led to a world that looked less like BitTorrent and more like Facebook. There’s an easy way out, but I doubt anyone in Silicon Valley wants to take it.


Ward Cunningham’s explanation of federation (scheme on right) — one client can mash together products of many servers. Federation puts the client, not the server, in control. 


It seems we got front-paged at Hacker News. So for those that don’t follow the blog I thought I’d add a one minute video to show how Smallest Federated Wiki uses a combination of JSON, NodeJS, and HTML5 to accomplish the above model. This vid is just about forking content between two different servers, really basic. Even neater stuff starts to happen when you play with connecting pages via names and people via edit journals, but leave that to another day.

This and more videos and explanations are available at the SFW tag.

The Part of Wiki Culture the Classroom Forgot

If you look at most treatments of wiki in the classroom, people talk about collaboration, group projects, easy publishing, revision control. All of these are important. But one important element of what makes a wiki a wiki has been underutilized.

Wikis not only introduced the editable page to users, but the idea of page-creating links. (In fact, this invention pre-dates wiki and even the web, having been first pioneered in the Hypercard implementation Ward Cunningham wrote for documenting software patterns).

Page-creating links are every bit as radical as the user-edited page — perhaps even more so. What page-creating links allow you to do, according to Cunningham, is map out the edges of your knowledge — the places you need to connect or fill in. You write a page (or a card) and you look at it and ask — what on this page needs explanation? What connections can we make? Then you link to resources that don’t exist yet. Clicking on those links gives you not an error, but an invitation to create that page. The new page contains both links back to concepts you’ve already documented, but also generates new links to uncreated resources. In this way the document “pushes out from the center” with each step both linking back to old knowledge and identifying new gaps.

In the video below I show this “pushing out from the center”  process on a wiki of my own and talk about how this architecture and process relates to intergrative learning. For best viewing, hit HD button and make full screen.

Using Wiki for Connected, Integrative Learning from Mike Caulfield on Vimeo.

Blue Hampshire’s Death Spiral

Blue Hampshire, a political community I gave years of my life to, is in a death spiral. The front page is a ghost town.

It’s so depressing, I won’t even link to it. It’s so depressing, that I haven’t been able to talk about it until now. It actually hurts that much.

This is a site that at the point I left it had 5,000 members, 10,000 posts, and 100,000 comments. And at the point co-founders Laura Clawson and Dean Barker left it circa 2011(?), it had even more than that.

And what comments! Because I say that *I* put sweat into it, or Laura and Dean did, but it was the community on that site that really shone.  Someone would put up a simple post, and the comments would capture history, process, policy, backstory — whatever. Check out these comments on a randomly selected post from 2007.

The post concerns an event where the local paleoconservative paper endorsed John McCain for their Democratic candidate, as a way to slight a strong field of Democrats in 2008.

What happens next is amazing, but it was the sort of thing that happened all the time on Blue Hampshire. Sure, people gripe, but they do so while giving out hidden pieces of history and background that just didn’t exist anywhere else on the web. They relate personal conversations with previous candidates, document the history the paper has of name-calling and concern-trolling.

Honest to God, this is one article, selected at random from December 2007 (admittedly, one of our top months). In December 2007, our members produced 426 articles like this. Not comments, mind you. Articles. And on so many of those articles, the comments read just like this — or better.

That’s the power of the stream, the conversational, news-peg driven way to run a community. Reddit, Daily Kos, TreeHugger, what have you.

But it’s also the tragedy of the stream, not only because sites die, but because this information doesn’t exist in any form of much use to an outsider. We’re left with the 10,000 page transcript of dead conversations that contain incredible information ungrokable to most people not there.

And honestly, this is not just a problem that affects sites in the death spiral or sites that were run as communities rather than individual blogs. The group of bloggers formerly known as the edupunks have been carrying on conversations about online learning for a decade now. There’s amazing stuff in there, such as this recent how-to post from Alan Levine, or this post on Networked Study from Jim. But when I teach students this stuff or send links to faculty I’m struck by how surprisingly difficult it is for a new person to jump into that stream and make sense of it. You’re either in the stream or out of it, toe-dipping is not allowed.

And so I’m conflicted. One of the big lessons of the past 10 years is how powerful this stream mode of doing things is. It elicits facts, know-how, and insights that would otherwise remain unstated.

But the same community that produces those effects can often lock out outsiders, and leaves behind indecipherable artifacts.

Does anyone else feel this? That the conversational mode while powerful is also lossy over time?

I’m not saying that the stream is bad, mind you — heck, it’s been my way of thinking about every problem since 2006. I’m pushing this thought out to all you via the stream. But working in wiki lately, I’ve started to wonder if we’ve lost a certain balance, and if we pay for that in ways hidden to us. Pay for our lack of recursion through these articles, pay for not doing the work to make all entry points feel scaffolded. If that’s true, then — well, almost EVERYTHING is stream now. So that could be a problem.




The OS-Based Lifestream Will Kill the Mega-Site, Continued

The OS-based Lifestream killed the mega-site.

The OS-based Lifestream killed the mega-site. And nobody seemed to notice…

Back in January, one of my predictions was that the “Revenge of the OS” would accelerate.

The idea was that Google and Apple didn’t need to compete with Facebook, because Google and Apple actually owned the one lifestream that mattered — the notifications panel of your smartphone. Facebook’s monopoly was in fact broken and  the age of the mega-site was over, because the point of integration had moved upstream, off the server and onto the device. Teens who jumped off Facebook had replaced it not with another service, but with their notifications panel, and the rest of us were doing the same. I predicted that Google realized this and would abandon their Google+ strategy, and fold their “portal/community” efforts into Android, with the panel as the lifestream.

Four months later Google reorganized, moving its Google+ Hangout team to the Android division.

Last week reports emerged that Google is now moving Google Photos out of Google+ as well.

In June, Apple announced one of the biggest new features in iOS 8 would be interactive notifications, that let users respond to notifications without opening the app.

And that’s not even the I-told-you-so yet.

Here’s the most interesting bit. A new app called WUT, supported by Google Ventures, lets you send anonymous messages to your Facebook friends. And, as a recent post on Medium points out, there is no app in this app. You interact with the app entirely through your notifications panel. Those Android and iOS panels have become the lifestream platform Facebook used to be, and Facebook is now a rather nice app that runs in those platforms.

This trend of identity moving upstream is huge, and I still see very few people grappling with it. We’re still fighting the last war.


All the Versions of the Amelia Bedelia Wikipedia Entry

I should probably stop talking about the federated aspects of Smallest Federated Wiki — as I mentioned before, whether federation works or not for any given use case is speculative. There is no way to come to real agreement about it since it relies on us making predictions about how people will act in a federated content environment, and no one really knows what that looks like at scale. In situations like this it’s probably best to just try to get people to use it rather than talk about models of use.

But Maha Bali makes some great points in the comments of the last post that I want to address. So I suppose I again risk the entire project by dragging people into its most contentious assertion 😉

Here’s Maha:

But then your point that wikipedia still has a place is a valid one: encyclopedias are not meant to provide the space for those diff perspectives, and wikipedia does have a discussion space for them, except not visible to the usual user.

I am just thinking of what it means to the “casual” web surfer (not the deeply versted researcher) to get confused by all the versions? There are already multiple websites for every web search we make, and we’re just usually using one search engine not a metacrawler or whatever.

First, let me agree with Maha that absolutely there is a place for Wikipedia in this equation. SFW is not meant to replace Wikipedia. But I want to address the “all the versions” critique.

Again, I think the issue here is we treat every piece of knowledge as if it were the Wikipedia page on the Civil War or Behaviorial Genetics. On those pages there’s a deep need for some consensus version that someone can read. But 99.999% of subjects are not like that. With a hat tip to Heavy Metal Umlaut, here are the 217 edits to Amelia Bedelia (more or less, speeding up the film may have dropped some edits).

Again, I can’t repeat enough the SFW is not a replacement for Wikipedia. But it’s really helpful to look at what *most* subjects look like, because our stress about “all the versions” is a classic case of over-engineering.

Of the 217 edits, most are vandalism and vandalism reversions. You’ll see that edits that flash by about Aspberger’s syndrome, “Booger Dystrophy” and the minimalistic “Dragons are cool”. You don’t see the war below the fold, but some of the edits are people adding fake Amelia Bedelia Movies below the book list (Sample title:  Amelia Bedelia in China (1897)). Other edits involve formating, italicization, small wording changes.

Filter out vandalism, style, and formatting edits, and there’s less than ten main components, really.

  1. The main stub. What Amelia Bedelia is, who wrote it. Some level of plot summary
  2. Some edits about other people involved with the work besides the author
  3. A section that exists for a while on how Amelia Bedelia teaches kids about polysemy, etc.
  4. The book list, with dates
  5. A section that exists for a while on the vaudeville/irish connection
  6. A section that appears and reappears about a statue of the author
  7. The cover of the book
  8. The link to the I Can Read Series
  9. The Cameroon hoax

What happens if someone gets linked to an old version? Flipping through the edits, it’s hard to see the harm. Someone goes to a page to get a question answered, presumably. Even the first version of the page will answer most questions that people want to ask. The biggest risk is someone might miss the book list, which seems of substantial use to the casual reader.

Flipping through the edits, and omitting the book list addition, it’s often hard to see how a newer version is particularly better than an older version.  In many cases, for many users, the newer version is worse or less useful (not even taking the Cameroon hoax into account).

This is not the story we tell ourselves about Wikipedia. But that’s because we know where Wikipedia really succeeds — it does an amzing job on the big pages, with a large vibrant community who engage with the page over a period of years, and make sure that items don’t fall between the cracks.

But if there’s one thing I’ve learned recently, again and again, it’s that much of the web is engineered for a scale that it doesn’t operate at. And to a certain extent that is true about Wikipedia. There’s no other Wikipedia, because Wikipedia works best at scale. It tends to not work as well with Amelia Bedelia-sized things.

So what the harm? As Maha mentions,  it’s all in the edits, right? If you want to dig? Not really. I sped the film up: it took me 12 minutes just to click through all the edits, primarily due to rampant vandalism. That doesn’t include reading them, diffing them, or anything of that nature. No one’s going to do that, short of writing a book on a subject.

And (primarly because of the vandalism issue) past versions are not exposed to Google.


So when that fact is gone, it’s gone. As an author, you give Wikipedia stewardship of your work, but you do in fact also give them the ability to erase it from the web.

Let’s replay history with SFW in the picture.

I’m looking at Amelia Bedelia one day, and I realize — wow, this looks a lot like anti-Irish vaudeville.

I look for an SFW page on Amelia Bedelia, I fork it to my site, and add my insight. If people like that insight they pull it back. The people who care most about scholarship around children’s books fork it back.

A person on Wikipedia finds my idea in a Google search, puts it on Wikipedia, it gets deleted eventually.

The vandalisms, including the Cameroon vandlaism, never happen, because in SFW the only house you can vandalize is your own.

You have a question about Amelia Bedelia. You do a Google search. You get Wikipedia, but you also get links to a number of SFW, ranked by a reputation algorithm. If Wikipedia answers your question, great. If not, you have these links.

You click into a SFW page on Amelia. Which one? From Google, the highest ranked one, which is probably the one most linked to, or the one one the site with other articles of high reputation on Children’s Literature, or whatever. You read that one.

If that doesn’t answer your question, you look up top and see there is a list of more newly updated ones. You pick the most recent one. In each case, you always have access to an author page which tells you why the curator of this particular page should be trusted.

At all stages of this, there’s never too many versions. There’s only what you are looking at, and additional versions if you need them.

The “too many versions” problem reminds me a lot of when AltaVista came out as a search engine, and suddenly people were very stressed about at all the search results for a term. There’s this stress that information is out there that we are not dealing with, but the reality is the way we deal with it is decide at each point whether it’s worth reading more.

My wife Nicole, for instance, would love to see lesson plans on SFW. She could jump to one on Van Gogh.

How would she know that it’s the “best” one? She wouldn’t. Largely because there isn’t a best one. But having the different versions visible allows her to quickly sort through them to get what she needs. Over time, the best one ends up being the one that everybody links to externally (e.g. tweeting “best van gogh lesson plan ever!”) or ends up forking to their own site.

Anyway, I should get off this topic — as I mentioned, I’m trying to steer away from the federated aspect of SFW because it’s the one point of contention in a product that has many other things going for it. But the answer, roughly, to how we deal with multiple versions is that we let networks and network algorithms sort them, versus use a “last edit/best edit” system.  For large complex articles with active communities this won’t be the best way to deal with things — last-edit/best-edit works great. But for most things we do it will in fact be better, and in fact help Wikipedia to be better as well.




Amelia Bedelia’s Hats Are Not the Problem

So there’s been an Ameila Bedelia Wikipedia hoax. We learn that Amelia was not inspired by a maid from Cameroon who wore sensational hats, a “fact” cited in a vandalism which survived on the site since 2009.


First, let me say in a world where Elsevier was recently discovered to have published half a dozen fake journals for drug companies, a world where more recently 60 articles were retracted from the Journal of Vibration and Control due to a “citation ring”, and a world where med research postdocs are faking anti-cancer trials and still working, I’m not sure to what we’re comparing this “hoax”.

But, more importantly, the main problem with Wikipedia is not about error.  The problem is more subtle than that, and it’s not something that the Daily Dot is going to cover anytime soon.

Here, for example, is a segment introduced on the Amelia Bedelia Wikipedia page in Spring of 2007, and existing through the end of that year:

The name “Bedelia” is a derivation of the common Irish name Bridget. Irish maids were portrayed as being comically inept in the vaudeville theaters of New England in the late19th century. A popular joke of the period has a maid instructed to “Serve the tomatoes undressed”; she brings the dish to the dining room, wearing only her underclothes, saying, “I won’t take off another stitch- not if I lose my place, Maam”.

Interesting, right?


In January 2008, that edit disappears.


As far as I can tell, the redaction never is discussed, and the vaudeville connection never returns.

Is the beloved Amelia Bedelia a protracted Irish Maid joke? A sanitized relic of a previous age when the Irish weren’t yet considered “white”?  That seems a rather important question for a scholarly work. And after reading  about the potential vaudeville connection, it’s hard to un-see:


At the same time, that’s a pretty hefty accusation to make on something that is supposed to be the reference page on Amelia Bedelia.

It’s important to note that, unlike uncaught hoaxes, this sort of thing happens on Wikipedia all the time.

And the problem here is not that the Wikipedia community allowed such a paragraph initially, or that it eventually deleted it. People can have honest disagreements about such things.

The problem here is the centralization. Someone removed the edit, no one in the community noticed or defended it, and the information disappeared for good. For all intents and purposes, it vanished from the face of the earth.

So while EJ Dickson runs down the number of places the Cameroon hoax showed up, somewhat harmlessly, in news copy and book reports, it’s more interesting to me to think how many of those people doing web research on Amelia Bedelia may have, in fact, been spared considering some of the more difficult and interesting questions Amelia Bedelia raises.


We need to have authoritative networks to turn to when trying to answer questions. There’s a place for things like Wikipedia, which pushes groups toward consensus. But when those networks are too centralized, or hold a monopoly on truth, minority concerns get lost, deleted, and overuled. Controversy gets papered over. Difficult questions get sanded down smooth. Life gets easier, but at a substantial cost.

This is the issue that federation as an alternative model of networked community is meant to solve, and one of the issues Smallest Federated Wiki is meant to address.  I’ve talked about that before here, but the point I want to make in this post is even more basic. In short, we are not living in a 2008 Jay Leno monologue. It’s time to stop pretending that error and overload are the web’s biggest issues, and time to start looking at the vast variety of voices and bodies of knowledge that still have no home on the web, and no easy entry into the discussions that define our culture.

That’s a less amusing story than stoned kids vandalizing Wikipedia pages, but it’s the one that actually matters.



[Images are linked through to sources. Click for source.]