Napster, “All the Rave” (Notes)

I’m on a staycation of sorts, taking a few days off to do nothing. One of the nothing things I’m doing is reading a couple books, The first, Reinventing Discovery: The New Era of Science, I’ll talk about later. The other one All the Rave (A history of Napster) I’ve barely started, but held some surprises for me. I wanted to jot them down here, partly to process them.

Sean Fanning was more radical and less radical than he gets credit for. The image of Sean we were sold back in 1999 by the Valley was of a slacker college student who just wanted to solve the problem of accessing music from anywhere and wrote Napster in a couple weeks. The image we were sold by the RIAA was of someone ripping off artists for profit. What I didn’t realize was that Sean was what we’d consider a legitimate gray-hat hacker before Napster, and that Napster was built with Sean using the famous/infamous w00w00 group on IRC as an extended learning community and sometime workforce. In fact, since Sean was completely unfamiliar with Windows programming he leaned heavily on the group to help him figure out how to build his prototype. It was a hacker IRC project from the start, both idealistic and radical. It wasn’t about the money, but it was also meant to shake things up from the start.

Sean seems to have been aware that this was not about music, but about rethinking the web. Sean mentions to the author that what struck him about IRC vs. the web was that IRC was “presence aware”. Here’s the author talking about that in a recent interview:

JOSEPH_MENN: Shawn’s great insight was that there was no reason that he could not combine the power of a search engine like Google with what is known as “presence-awareness” of instant messaging and other systems. In this way, only people whose MP3 files were available at any one moment would have those files listed for others to find. 

From very early on, Sean seemed to have a good insight about how a presence-aware web made a peer-to-peer architecture possible. To get the progression it is useful to think through what existed at the time — people were indexing MP3 sites, Google-style. But by the time you went to most of these sites they were down or gone, or the published files had changed. I may be wrong about this, but my understanding from the book was that the way Napster worked was that when you logged on you published your index to the server, and while you were connected those results would be part of the searched database. That’s a fundamentally different way of thinking about the problem of search. The Berners-Lee web, the Google web (or, I guess back then Alta Vista web) is based largely on the conception of permanently available URLs. Google just assumes your server is there because that’s how the server-client web works. And that assumption ends up making a particular type of web run by a particular type of people. Presence awareness subverts that.

In today’s world that may seem old hat. Or maybe not. The more I think about it, the more I think the promise of that vision has never really been delivered on.

To download is to share. This was a design decision that probably owed a lot to the IRC culture that Sean came out of. While you were using Napster to download you were also publishing your index and opening up to download (at least by default). It was just the way it worked. Use = sharing by design.

Now you could circumvent that, and some did. But the point was you had to hack your way to free-riding, it wasn’t the default. Whatever you believe about Napster and file-sharing it’s a powerful example of how software design is a driver of culture as well as a product of it.

It was not as simple as “The Music Industry Killed Napster”. I’d love it to be that simple a story. And I’m not to the Metallica portion of the story yet, but it’s already clear that Napster had issues before then. Most notably, Fanning’s uncle negotiated himself a 70% stake in the company early on, and became a huge corporate liability. His influence made investment problematic and governance a massive problem. So there you go — the music industry can screw you, but it takes family to *really* fuck you up.

Maybe more notes as I get further into the book; I find even when reading for fun I do better if I process the experience via blogging.

Why the Comprehensive Attribution Statement Makes Sense

The other day I was pointed via a tweet by David Wiley to the Comprehensive Attribution Statement (CAS) that Lumen uses. The CAS is part of Lumen’s “attribution architecture”, which is to say it provides a standard way to cite content sources. In short, it’s a sort of endnote format for remixed content. Here’s the distinctions it makes:

The LLAA builds up a comprehensive attribution statement (CAS) from several smaller attribution primitives (AP). Each individual attribution primitive corresponds to a type of content:

  • Original content – Material that was either created specifically for the Open Course Framework (OCF) or material created previously that has never been published before (e.g., a faculty member’s lecture notes)
  • CC licensed content – Materials previously released under a Creative Commons license
  • Copyrighted video content – Materials from YouTube, Vimeo, and other sources whose Terms of Use allow embedding
  • Public domain content – Materials no longer covered by copyright
  • CC licensed content with specific attribution requirements

Attribution primitives should be listed in this order, with all type 1 attributions preceding all type 2 attributions, all type 2 attributions preceding all type 3 attributions, etc. The exception to this rule is attribution of original content created by Lumen Learning, which should always be listed last.

And here’s what a statement like that might look like in practice:

This page is licensed under a Creative Commons Attribution License and contains content from a variety of sources published under a variety of open licenses, including:

  • Original content contributed by Mr. Putey of the Anglo-French Silly Walk Initiative.
  • Content created by Edmund Blackadder for the History of English Dictionaries project, originally published at http://someurl456.org/ under a CC BY license.
  • The video documentary of the Kennedy assassination was created by Dave Lister and published at http://youtube.com/linktovideo. This video is copyrighted and is not licensed under an open license. Embedded as permitted by YouTube’s Terms of Use.
  • Content created by Henry Wensleydale.
  • Original content created by Lumen Learning.

Why is this a big deal? Because attribution of remixed content is a bit of a mess. Actually, not a “bit” of a mess — it’s a disaster.

For example, photos are put up, and linked a dozen different ways. Here’s a great example of someone trying to do the right thing in a useless fashion:

Comprehensive_Attribution_Statement

This is a good attempt, but there’s no link to the original, no statement about what CC license is used. No photographer listed. And my guess is that that is largely because the decision was made to cram the attribution into the caption. Unfortunately it misses the major purposes of open content attribution — first, that I should be able to use attribution to find the original (if possible) and related work of the author/photographer. Second, that a person that wants to reuse this image has sufficient information to do their own attribution of it (and to not break any laws). The easiest way to do that is to give up the ad hoc methods of citation and come up with a more mundane but predictable approach.

Text gets even messier to cite. If I rewrite a text, for example, a worksheet on twitter in the classroom, it’s easy to get bogged down in the detail of how that was used. I’ve cited such text a dozen ways, trying to explain whether the borrowed text was written in collaboration, just borrowed — whether pieces were “used” or whether pieces were “written for”, whether it was “based on” vs. “contains material from”, what bits were under what licenses, etc. The Lumen approach to this gets some of those distinctions in while not losing sight of the fact the main purpose of the statement is to provide a path to the original and a concise desciptions of the conditions of reuse. So for instance, if some of Reclaim Learning’s materials are borrowed from UMW’s Domain of One’s Own project and edited by Jim Groom, Reclaim’s Attribution might look like this:

This page is licensed under a Creative Commons Attribution License and contains content from a variety of sources published under a variety of open licenses, including:

What I really like about this is two things. First, having struggled with attribution, these categories make sense — highlighting what is crucial and ignoring what is not. Second, I like that if I take this page and edit it for my own uses, I just add in the reference for the original content and put my attribution on top:

This page is licensed under a Creative Commons Attribution License and contains content from a variety of sources published under a variety of open licenses, including:

This follows the document around attached to the bottom of it, out of the way, but accessible. I’d go a step further on student-facing pages and make it a clickable hidden div to minimize distraction while enforcing structure, but YMMV (example of how that might look here: http://screencast.com/t/cW9xkzwjOq5t). For different sorts of projects, the format could vary (open data projects, for example, have particular requirements). But there would be on the site a statement of what the customized format was and how to interpret it.

As I was saying in a private conversation with Jared Stein, I’ve been waiting for the open metadata revolution for quite a number of years. Header-embedded metadata is neat to think about. But ultimately a simple endnote set of conventions would get us 90% of what we want, today. And if Google Scholar can create citation indexes out of APA and MLA citations, there’s no reason that a format like this could at least start to form the basis of a reuse tracking system.