Connected Copies, Part One

This is a series of posts I’ve finally decided to write on the subject of what I call “connected copies”, an old pattern in software that is solving a lot of current problems.

It’s really a bit of a brain dump. But it’s my first attempt to explain these concepts starting at a point most people can grasp (e.g. people who don’t have git or Named Data Networking or blockchain or federated wiki as a starting point). Hopefully publishing these somewhat disorganized presentations will help me eventually put together something more organized and suitable for non-technical audiences. Or maybe it will just convince me that this is too vast a subject to explain.


The Party

Let’s imagine you are planning a birthday party for an office mate. And for both musical and technical reasons, let’s imagine you were doing this in 1983.

(Insert cassette of David Bowie’s Let’s Dance in 3… 2… 1…)

You’re planning a party for Sheila, and you want everyone to bring something. You talk to everybody and write up a list:

Cliff: Plastic silverware
Norm: Vanilla ice cream
Sam: Chocolate cake with cream cheese frosting
Rita: Card, streamers
Diane: Ice cream fixings

You write out this list, and you photocopy it. Each person gets a copy on Monday so they can prep Friday’s party.

Programmers call this sort of copying passing by value and it has a lot of advantages. Each person has a copy they can carry around. They can change that copy, annotate it. If Cliff loses his copy, Norm doesn’t lose his. And so on.

Unfortunately this party hits a snag. Norm is talking to Sheila and finds out that Sheila is lactose intolerant. On his copy he crosses out the ice cream, and replaces it with lactose-free cookies, making a note of her intolerance.

But because these sheets are copies, no one else ever sees his update. He comes with cookies, but Diane is still bringing ice cream fixings, and frankly we’re not quite sure whether that cream cheese frosting cake is such a good idea either. The party is a disaster.

Here’s another approach we could use for the party. Instead of making copies, we say “sign up on the kitchen whiteboard”.

If we run this with the office whiteboard, we get a different, and better result. When Norm finds out about the sensitivity, he changes his item to the cookies, and writes a note about the issue. Diane, seeing this, skips the ice cream fixings in favor of getting soda, and Sam reconsiders his cream-cheese cake, finding something more palatable. As each make their changes, they note them on the board.

Saying where the definitive list is (e.g. “The kitchen whiteboard”) instead of making copies of its value is what programmers call passing by reference. As you’ll note, it has certain advantages.

The Web Is a Series of Whiteboards

In our kitchen whiteboard example, of course, it has one big disadvantage, which is you can only access it when you’re in the kitchen. Of course, with the dawn of the Internet, this problem was largely solved. A web page is like a whiteboard you can read from anywhere. A calendar feed, a video link — these are all protocols which say, more or less, go find the set of values currently at this location.

The web is in fact built around this structure. It understands locations, not objects or copies.

What, for example, is the URL for “find me a copy of Moby Dick”? There is none. There is only a URL for “get me a copy of whatever is currently at the address that this one instance of Moby Dick is supposed to be at”.

It is important to note that the web doesn’t have to be structured this way. Torrents, for example, use a different approach. When you attempt to retrieve something through torrenting, your computer isn’t concerned with where the one true location is. Instead, your torrenting client asks for an object, and  any locations that have that object can return pieces of it to you.

Programmers have developed ingenious ways to make requests for locations to act sort of like requests for objects (proxy servers are a simple example) but at the heart of the web as we currently use it is this assumption: things are defined by the location from which they are served.

Again, this model is an amazing model, and one that should be embraced in a lot more circumstances. As Jon Udell has noted, people can use a system of passing by reference to maintain tight control over their personal information: if I share a Word Document with you of my calendar, I lose the ability to update that; if I share my calendar feed, I retain central control.

But just as there are many circumstances where we are still passing around copies when we should be passing around references, there are also cases where we are not fully understanding some of the benefits copies provided.

In Which Whiteboards Go White

I used to have a hobby of collecting old reference works. Ancient textbooks, encyclopedias, atlases, books of trivia from the 1800s; I loved them all.


What’s amazing to me (and what’s been amazing for a long time) is how durable these books were. Not as physical books, mind you, but as an information strategy.

Here, for example, is a random page of a used bookseller’s catalog of used works available in 1812 or so:


These are individual issues, available for resale, from what I can tell. But look at the dates. Even at that time you have stuff from 200 years prior being sold in that store. What do we have on the web that will survive in 200 years?

Ah, you say, it’s just survival bias. These are the works that survived; we don’t know what didn’t.

Except, no.We can take this set of books as our little experimental cohort, start the clock in 1812, and then see how many make it through to today.

And when we do that we find the answer is to how many survived is… all of them. In fact, take a look, most of these are now digitized on the web:

The only works I can’t find here are the two works in Latin, and I’m guessing that’s just due to them not being digitized (or perhaps even due to me botching the search terms).

In other words, on a random page of a random bookseller’s sell-sheet from 1812, every English language work identified has survived.

How do we compare today?

Well, here’s a screenshot of part of a page from in one of its reboots around 2001.


If you don’t know what Suck was, you can read its history here, but what it was is kind of irrelevant in this example. What you need to know is that they aren’t linking out to fanzines here. They are linking out to major stories and major websites of the time.

There’s about 30 links on the whole page. Three of them work. Three. One of the links that works goes to another page on, one goes to the Honolulu Advertiser, and one to a article.

Go to any archived page of about the same time and you’ll find a similar pattern.

Random references to books that are 400 years old are stable. But references to web works that are 15 year old? You’ve got a one in ten chance that it’s still around. You’ve literally got a worse chance clicking a link on a fifteen year old article than you do scratching off a scratch ticket.

It’s like you were told the answer you need is on the work kitchen whiteboard, but when you go there, the whiteboard is erased. Or missing. Or filled with porn.

What the heck is going on?

Read Something, Host Something

The biggest reason that all the works referenced by the 1812 bookseller survive is that books are copies.

The original proofs of these books did not survive. The specific physical books that were in that bookseller’s shop that year are also less likely to have survived. But some of each of the maybe 1,000 books printed in these runs did survive.

Books have a weird mode of replication. The more people that read a book, the more people host that book in their personal bookshelves.

It’s not one-to-one, of course. I might lend you a book to read. You might read a book in the local library, or buy one secondhand. But the nature of the system of physical books is that if 10,000 people read a book there will be 1,000 books hosted at various locations. And if more people start reading that more books are produced and distributed to yet more locations to be hosted.

You read something, you host it. When you share with others, you share your own copy.

Compare this to the web model. In the web model, a million people can read a story and yet there is one single site that hosts it. The person that hosts it has to take on the expense of hosting it. When it no longer serves their interest they take the copy down, and that work is gone, forever.

It’s worth noting that the web model is is less scalable than the book model. In the book model, the distribution and hosting function is spread out among current and former readers: as the book becomes popular the weight of hosting it and sharing it is dispersed. In the web system, something suddenly popular is a burden to the publisher, who can’t keep up with spikes in demand as they hit that specific URL.

You can imagine another system for the web fairly easily, where everyone has a small server of their own, the way everyone once had a bookshelf. When you share a link with a friend, you share a link to your copy of something on your bookshelf, hosted on your own server.

That friend reads the page and decides to share it with others. In doing so they make a copy to their server, and they share from there to more friends. As the work propagates out, server load is dispersed, and copies proliferate.

These are some of the ideas underlying recent inventions such as the Interplanetary File System and federated wiki (although each of those is much more than just that idea).

Corporate Copies

These ideas may seem far out, but they’re not at all. In fact, because of the way web use has evolved, it’s inevitable that we’ll move in the direction of copies. It’s just a matter of whether we get the corporate version of this vision or a more radical reader-centered vision.

The corporate version is coming, without a doubt. Consider, for example, the day that Jessica Jones, a ten episode Netflix series was released. Over that weekend I’m guessing that several thousand people in my zipcode binge watched those episodes. All streaming it from Netflix which was how many miles away.

Which, of course, makes no sense.

I’ve downloaded all 10 episodes. But when my neighbor gets the urge to binge, my neighbor’s set-top box doesn’t go 20 feet to get those files. It goes thousands of miles to Netflix’s servers (or hundreds of miles to Netflix’s distributed content network).

The Internet wasn’t really built for this sort of thing. The assumption the Internet was built on was “I want a certain remote machine somewhere to do something, send it a message and get the result.” When we look at content over the web, this translated to “Tell the remote computer to send me the file at this location or with this id”. But central to the protocol is we know the where and the where gives us the what.

This makes sense when we are trying to get specific information from specific computers. It starts to make less sense when we want a thing that could be in any of thousands of places, not a response from a particular machine.

Eventually this will fall away. Thirty-six percent of Internet traffic in the U.S. is Netflix shipping you videos you could more easily grab from your neighbors.

Of course, if corporations implement it, it will suck. Which is why there’s a whole bunch of others thinking through the implications of connected copies. And it’s a reason that you should be thinking about it too, if you want the next iteration of the web to be awesome and not suck.

The current system has us “know the where and get the what”. The next system will have us “know the what and get the where”. But thinking through the implications of that beyond Netflix delivery is what interests me, because it opens up a new way to think about copies entirely.

Next: Connected Copies, Part Two



SoundCloud and Connected Copies

SoundCloud, a music publishing site which holds millions of original works not held elsewhere (and over a hundred million works total), may be in trouble. And if it is in trouble, we’ll lose much of that music, forever.

The situation is, of course, ridiculous.

I know I sound like a broken, um, MP3 file on this, but theres a simple solution to this problem: connected copies.

In such a scheme, I might share the file to SoundCloud (perhaps from my server, perhaps not). As other people share it, copies are made to their servers. These copies are connected by an ID and protocol that allows fail-over: if the copy cannot be found on the SoundCloud server, it tracks it down to the other locations (the “connected” in “connected copies” means that each copy points to the existence of other known copies).

Again, this is the underlying principle of lots of cool things not yet on people’s radar, like the Interplanetary File System, Named Data Networking, federated wiki, and Amber. It’s the future of the web, the next major evolution of it.

It’s worth thinking about.


P. S. While SoundCloud is still around you should check out my awesome playlist of new darkwave and neo-psychedelia tunes from little known artists.

An End to the Den Wars?

As you doubtless know by now, in 2010 I went and gave a plenary at UMW on the Liberal Arts in an era of Just-In-Time Learning, drank more than any human really should at the various parade of after-events, predicted the coming onslaught of xMOOCs in a drunken vision, and ended up going back to Jim Groom’s place at like 3 a.m. to drink some more.

It was at that point, where my liver was already in the process of packing its bags and moving to find a home in a saner individual that Jim Groom asked me a fateful question.

“So,” he said, “What do you think of my den?”

I looked up and noticed we were sitting in a den.

“It’s nice,” I said.

I don’t remember much more about that night, but I did wake up back in the hotel, so that’s good.

It would be the last time I would drink to that level, because, no joke, I was hungover for three days. Welcome to age 40, time to stop acting like a teenager.

I forget the next time I saw Jim in person, but it was at one conference or another. Someone tried to introduce us, not knowing we go way back (2007?) and Jim said something to the effect of “Oh, I know Mike, we go way back. Last time he was over my house he insulted my den.”

I tried to piece together fragments of that night.

“I said it was nice,” I said.

“Yes,” he said, “Nice. You said it was ‘nice’. I worked hard on that den.”

This was the beginning of #denwars, which would come up anew at every conference.

But this week, heading to ELI, I realized there is another side to the story.

On the way back from that 2010 UMW event I took Amtrak, and I started writing an album called Double Phantasm. (The title is a pun involving John Lennon’s Double Phantasm album and the Derridian notion of hauntology. This is why I’m currently outselling Taylor Swift).

The first song on it, Queen of America, was loosely based on an events of that week. And until yesterday I thought that was the only song that bore any relation to that trip. The rest of the album is a science fiction concept album that details the dying romance of a man and a woman during a post-apocalyptic future. (Again, these hot song ideas are the secrets to my stunning success).

But giving it my first relisten in several years on the plane, I was suddenly struck by the last song, “Like a Great Big Meteor“. It’s the final scene of the four song apocalyptic romance.

And there it was, in the setting of the final scene of that album. It was Jim’s den:

If all we have is now,
then what did we have then
as she sat across me nervously
in what used to be a den?

Here, in this verse, looking for a scene which would contrast former suburban opulence with post-apocalyptic decay I could find no better setting than Jim’s den. The world is over in this song, but what we miss is the den.

The song is rough, as all my songs are (I spend hours writing and tinkering with synth settings and textures, and then basically record a demo-level version of the song in thirty minutes, because I like writing and synths but hate production and singing). Sometimes this style of production works, and sometimes it doesn’t. It sort of half-works here.

But if you listen to my rough vocal on that track you can hear the raw and powerful emotions that Jim’s den stirred in me.

So there you go Jim. I did say your den was “nice”.  But that was only because the depths of my feelings for your den could only be expressed through art. Perhaps we can now lay to rest the den wars.