Still working with DokuWiki as an educational platform for faculty here at WSU Vancouver. I’ve found a couple things that are worth mentioning, Thought I’d jot them down here. This post deals with spam prevention.
The idea that Dokuwiki wikis don’t get spammed as much as MediaWiki installs is true, but trivially so. You’ll get more than enough spam to clog up the series of tubes that is your website. You’re going to have to lock down the installation.
I’ve experiemented with a couple approaches to this. Here’s some things you don’t want to do:
- The common “must confirm email” approach is not a long term winner. Plenty of spambots now happily confirm email, get user accounts, and live happily simulated lives on your wiki discussing the latest medical devices and weight-loss drugs available.
- Corralling freshly registered users into a “non-editing” user type is also not a great idea. I registered 8 students in my class during class for a wiki project. They then waited while I fiddled around and bumped up their privileges. It’s hard to imagine that process scaling in an academic setting.
- Similarly, deactivating registration and doing admin panel sign-ups manually is not a pleasant activity either.
- LDAP then? Ugh. An EXCELLENT feature of DokuWiki. But not really a great option in academia for a pilot project. You’d have to coordinate with IT (which will lead to who knows what). Might be something to explore down the road, but not as you’re getting this off the ground.
- Visual post CAPTCHAs? Yes, this is a great way to spark a multi-million dollar ADA/Section 508 lawsuit. Avoid.
So what do you do?
- Set read permissions to “all”. Anyone can read.
- Set edit to whatever your default confirmed registered user is.
- This configuration is that everyone can read, but only registered users can edit.
- Keep the registration link/functionality up.
- Install the Captcha plugin. Under type, chose “question”
- Make sure that registered users *don’t* have to do the CAPTCHA. In this configuration, since all non-registered can do is read, the only place the CAPTCHA will be is on the registration form.
This option will ask the student a plain text question of your choice when they register. If they get it right, registration proceeds. If not, it bumps them back.
Here’s where a bit of discretion comes into play. You can take one of two approaches:
- Make the question a piece of cultural knowledge that students should know — e.g. the name of the dining commons.
- Make the question “Access Code?” and have them supply an access code furnished by you or the prof.
As I went through “cultural knowledge” access codes, I started to realize how fraught that process was. I can maybe talk more about that later. I also realized what I really wanted was a semi-automated process for WSU staff and faculty not available to outsiders. I decided on the access code with a twist.
Here’s how it works. If you mail email@example.com from a WSU email account, an autoresponder will send you back the code. If you mail it from a non-WSU account, you get nothing. I do this through setting up an autoresponder on that Gmail address with the code in it, but routing everything not from @xxxx.wsu.edu directly to deletion.
So there you go, that’s my setup. Maybe in a few days I’ll talk about my depressing struggle with various markdown plugins. Or requests… I’ll take requests too.
Yesterday-ish, from Justin Reich:
I was also somewhat surprised to learn that in many systems, it is actually quite difficult to get a raw dump of all of the data from a student or class. Many systems don’t have an easy “export to .csv file” option that would let teachers or administrators play around on their own. That’s a terrible omission that most systems could fix quickly.
A couple years ago, working on an LMS evaluation, I kept getting asked what reporting features each potential platform had. Can this platform generate type-of-report-X? About 8 years ago, working on a ePortfolio evaluation, the same question came up — where are the reports? Does this have report Y?
I’d always point out that we didn’t want reports, we wanted data exports and data APIs that allowed us to generate our own reports, reports that we could change as we developed new questions and theories, or launched new initiatives in need of tracking. The data solutions we’re likely to see have real impact (with no offense to Reich’s Law of Doing Stuff) are likely to come from grassroots tinkering. Data that is exportable in common formats can be processed with common tools, and solutions built in those common tools can be broadly shared. CSV-based reports developed and adopted by Framingham State can be adopted by Keene State or WSU overnight. A solution one of your physics faculty develops can be quickly applied across all entry level courses.
What you want is not “reports” but sensible, easy, and relatively unfettered access to data. And if you don’t have someone on your campus that can make sense of such data, then you need to either hire that person, or give up on the idea that a canned set of reports are going to help you. When fields are mature, canned and polished reigns. But when they are nascent (as is the field of analytics) hackability is a necessity.
Via Clay Fenlason: “Feeling like the time spent to understand WTF @holden is talking about would be well spent, but who has that kind of time?”
Fair enough. I blog mostly for myself, to try and push on my own ideas in front of a relatively small group of people I know who push back. And part of that process is a bit manic and expansive. At some point that’s followed by a more contractive process that tries to organize and summarize. Maybe it’s time to get to that phase.
So I’ll do that soon. What I’ll say in the meantime is that all of this stuff — hybrid apps, storage-neutral apps, federated wikis, etc — is interesting to me because of my obsession with hacking and reuse. Why is reuse so darn hard? Why don’t we reuse more things? What systems would support a higher degree of reuse and sharing, of hacking and recombination? What are the cultural barriers?
There are implications to this stuff far bigger than that, but reuse (and hacking, which is a type of reuse) has been a core obsession of mine for a decade now, so that ends up being the lens.
You go to an event and there’s 50 people taking pictures of it individually on their cell phones, none of whom will share those photos with one another, yet all of whom would benefit from sharing the load of picture taking. There are psychological and social reasons why that’s the case, but there’s also technological reasons for that. Likewise there are brilliant economics teachers who have built exercises and case studies that would set your class on *fire* if you used them — but you’ll never see them.
I’ve been over the several hundred reasons why reuse doesn’t happen, over a period of ten years, It’s not just about the technology, absolutely. But occasionally I see places where reuse explodes, and the technology turns out to be a pretty big piece of that. My wife is a K-3 art teacher. And Pinterest just exploded reuse in that community. Sharing went from minimal to amazing in the space of 12 months. And suddenly she was putting together a much better art curriculum than she could have ever dreamed of in half the time, in ways that had a huge impact on her students.
So — reuse, sharing, networked learning, hacking. I’m interested in the two sides of this: first, we must teach students how to work this way. We have to. And two, we have to get our colleagues to work this way.
What does that have to do with the shift to hybrid apps? With moving from a world of reference to a world of copies and forks? With storage-neutral designs? With the pull request culture of GitHub vs. the single copy culture of OER? With the move back to file-based publication systems? I’m still trying to work that out. But I think the answer is “a lot”, and a post is coming soon.
File-based sharing based around pushing copies of good stuff to others. That’s what the federated wiki is about.
For that reason I find newer efforts like this that push files around instead of references to be fascinating. This out today from Dropbox, a new product called Carousel:
Photos of events such as graduations and weddings, Houston points out, are spread over the devices and hard drives of multiple guests. It creates pervasive photo anxiety: People are no longer sure they own the best images of the most important moments in their lives. The app, which becomes available this week for iPhones and Android phones—with a version coming soon for desktops—taps into photos stored on Dropbox and allows users to cycle through them quickly and send images to friends and family, so they can add them to their collections well.
Think about how this changes notions of sharing, and you’ll see it as part of a move towards file-based copy systems, and the pull request approach of a GitHub.
Also, read that paragraph again, and tell me if that doesn’t look similar to the educational materials situation we face everyday.
OK, now imagine your wiki exists in a Dropbox account, and you do the same thing — you flip though all your articles and forward the ones that you think are useful to your various federations. Those get dropped into other people’s own dropbox wikis, and the virtuous cycle continues.
It’s a different way of thinking about things. It’s file based, and it sees copies of things as a feature, not a bug. The storage for your project is not seperate from the sharing features of your project. We let the copies happen and we sort out the mess afterwards.
My argument is not that Dropbox rules, but that this is part of a larger trend that rethinks how sharing and forking works on the new web. It’s also a potentially a powerful rethinking of how OER could propagate through a system.
ONE IMPORTANT NOTE: I’m just toying with this idea, not asserting it at this point. But part of me is very interested in what happens when we view the rise of the app as not a betrayal of the original vision of the web, but as a potential return to it. I don’t see many people pushing that idea, so it seems worth pushing. That’s how I roll. ;)
Apropos of both an earlier post of mine and Jim’s Internet Course. This is a screenshot of the first web browser (red annotations added by me):
The first web browser was a storage-neutral editing app. If you pointed it at files you had permission to edit, you could edit them. If you pointed it at files you had permission to read, you could read them. But the server in these days awas a Big Dumb Object which passed your files to a client-side application without any role in interpreting them.
I never used the Berners-Lee browser, but even in the mid-90s when I was hacking my first sites together Netscape had a rudimentary editor (I was using something called HoTMeTaL at the time, but stilll):
This is still the case with many HTML files a browser handles, but what’s notable here is that in those days a browser sort of worked much like what a storage neutral app would today. When I talk about having the editing functions of a markdown-wiki client-side in an app, we’re essentially returning to this model.
And think about that for a minute. Imagine what that wiki would be like — you tool around your wiki in your browser editing these Markdown files directly. When someone hits your site in their browser, it lets them know that they should install the Markdown extension, or download the Markdown app to view these things. Grabbing a file is just grabbing a file.
So what happened to this original vision? So many things, and I only saw my little corner of the world, so I’m biased.
- Publishers: The first issue hit when the publishers moved in. They wanted sites to look like magazines. This accelerated a browser extension war and pushed website design to people slicing up sites in Adobe and Macromedia tools.
- Databases + Template-based Design: As layouts got more complex, you wanted to be able to swap out designs and have the content just drop in; so we started putting pages in database tables that required server interpretation (this is how WordPress, Drupal, or alomost any CMS works for example).
- Browser incompataibility, platform differences: People didn’t update browsers for years, which meant we had to serve version and platform specific HTML to browsers. This pushed us further into storing page contents in databases.
- E-commerce. You were going to have a database anyway to take orders, so why not generate pages?
- Viruses and Spyware. Early on, you used to download a number of viewer extensions. But lack of a real store to vet these items led to lots of super nasty browser helper objects and extensions, and the fact that you used your browser for e-commerce as well as looking at Pixies fan sites made hijacking your browser a profitable business.
In addition, there was this whole vision of the web as middleware that would pave the way to a thin-client future free from platform incompatibilities. Companies like Sun were particularly hot to trot on this, since it would make the PC/Mac market less of an issue to them. Scott McNealy of Sun started talking about “Sun Rays” and saying McLuhanesque things like “The Network is the Computer“.
In the corporate environment, thin clients are wired to company servers.
In your home, McNealy envisions Sun Rays replacing PCs.
“There’s no more client software required in the world,” he said. “There’s no need for [Microsoft] Windows.”
Sun Rays fizzled, but the general dynamic acclerated. And part of me wonders is it accelerated for the same reasons that Sun embraced it. In a thin client world, the people who own the servers make the rules. That’s good — for the people who own the servers.
This is really just a stream of conciousness post, but really consider that for a moment. In the first version of the web you downloaded a standard message format with your email client, and web pages were pages that could live anywhere (storage-neutral) and be interpreted by a multitude of software (app-neutral). In version two, your mail becomes Gmail, and your pages get locked into whatever code is pulling them from your 10 table database. And yes — your blogging engine becomes WordPress.
OF COURSE there were other reasons, good reasons, why this happened. But it’s amazing to me how much of the software I use on a daily basis (email, wikis, blogs, twitter) would lose almost nothing if it went storage neutral — besides lock-in. And such formats might actually be *more* hackable, not less.
It’s also interesting to see how much other elements of the ecosystem have solved the problems that led us to abandon the initial vision. Apps auto-update now. The HTML spec has stabilized somewhat, and browsers are more capable. The presence of stores for extensions gets rid of the “should I install random extension from unknown site” problem — people install and uninstall apps constantly. Server power is now such that most database-like features can be accomplished in a file-based system — Dokuwiki is file based, but can generate RSS when needed and respond to API calls. And, interestingly, we are finally returning to a design minimalism that reduces the need for pixel-based tweaking.
In any case, this post is a bit of a thought experiment, and I retain the right to walk away from anything I say in it. But what if we imagined the rise of apps as a POTENTIAL RETURN to the roots of the web, a slightly thicker, more directly purposed client that did interpretation on the client-side of the equation? Whether that interpretation is data API calls or loading text files?
I know that’s not where we are being driven, but it seems to me it’s a place that we could go. And it’s a narrative that is more invigorating to me than the “Loss of Eden” narrative that often hear about such things. Just a thought.
It’s a classic seperation of concerns (SoC) solution:
The unhosted web apps we use can be independent of our personal server. They can come from any trusted source, and can be running in our browser without the need to choose a specific application at the time of choosing and installing the personal server software.
When you dig into it, you start to see how radical an idea storage-neutrality is. Our assumption that because we need 24/7 access to our data via servers we also need to run server code is so deeply ingrained in the public consciousness that when you challenge it people don’t tend to comprehend what you’re challenging. But it’s this idea — that because our data is on Server X our code must be as well — that is at the heart of the corporate control of what Jon Udell calls our “hosted lifebits“. And if you want the sorts of freedoms people care about, that’s the piece you have to attack.
This is not a “compromise solution”. It’s a much more radical rethinking of what needs so happen. The future is server-backed/client-based apps, one way or another. That can serve to increase our freedom or to lessen it, depending on how we approach the next several years. I don’t really know what the correct answers are, but it seems to me this is the right fight.
Jim’s got a great summary of the larger idea behind UMW Domains (written by Ryan Brazell) up on his site. The core idea — personal cyberinfrastructure — is one I buy into, but at the same time the current mechanisms for it (cPanel, personal servers, and the like) seem clunky and not poised for greater adoption (although I watch the Thali project with interest).
Rather, the route to personal cyberinfrastructure is likely to run through storage-neutral apps. Briefly, the way most apps work now is that there is a program on your tablet/desktop/phone that is owned by Company A, and then there is often a certain about of web storage used by that used by that app, also owned by Company A. There’s a certain amount of web-based processing, also done on servers owned by Compnay A. This is somewhat different than the PC model, where Adobe sold you software but you owned the disk that held all your image creations, Microsoft sold you MS Word but your computer ran it, etc.
The cPanel-as-infrastructure response to that is to move to an all-web-app where you own the server. Some of the apps have mobile extensions to them, but by and large you avoid the lock-in of both modern web apps (Google Docs, Dropbox, Tumblr) and modern apps by going to open, HTML-based web apps.
This works, but it seems to me an intermediate step. You get the freedoms you want, but the freedoms you care about are actually a pain in the ass to exercise. Klint Finley, in a post on what a new open software movement might look like, nicely summarizes the freedoms people actually want from most applications (as opposed to content):
- Freedom to run software that I’ve paid for on any device I want without hardware dongles or persistent online verification schemes.
- Freedom from the prying eyes of government and corporations.
- Freedom to move my data from one application to another.
- Freedom to move an application from one hosting provider to another.
- Freedom from contracts that lock me in to expensive monthly or annual plans.
- Freedom from terms and conditions that offer a binary “my way or the highway” decision.
You get all those freedoms from the web-app personal cyber infrastructure, but you get them because you do all the work yourself. Additionally, your average user does not care about some of the hard-won freedoms baked into things like WordPress — the ability to hack the code (we care about that very much, but the average person does not). They really just want to use it without being locked forever into a provider to keep their legacy content up.
What I think people want (and what they are not provided) is a means to buy software where others do all this work for you, but you hold on to these freedoms. And assuming we live in a market that tries to match people with products they want (big assumption) the way that will come about is storage neutral net-enabled apps. I’ll own virtual server space and cycles somewhere (Amazon, Google, Microsoft, Squarespace, wherever). I’ll buy apps. But instead of installing software and data on the app-provider’s server, they’ll install to my stack on the web. And because they’ll encrypt that data, the company that runs my server won’t be able to see it either. My subscription to Adobe or Word will operate much like older subscriptions. Subscription will get me updates, but at any given point I stop paying Adobe I can still run my web app on my server in the state it was in when I stopped paying them.
Why is this more possible than the open web app model? None of the major providers have much incentive to go this route. Subscriptions are a lucrative business with undreamed of lock-in potential. I would say there are two reasons. First, companies with a virtual server platform (Microsoft, Google, Amazon) have some incentive to promote this model. Even Apple has a chance here to pair its app store with virtual server space. Second, and more importantly, such a scheme would be a huge boon to small developers and hackers. Knowing that they don’t have to scale up server architecture to sell server-powered apps frees them to focus on the software instead of scalability, the way that API-rich operating systems allowed previous generations of developers to focus on their own core product. And as this broadens out to where everyone’s phone has a slice of supercomputer attached to it, some really neat things become possible: truly federated wikis where pages are spread across multiple personal sites, music software that can write down effect-laden tracks in near real-time using rented processor time, music library apps written in 200 lines of code. That’s the larger win, and that’s where we want to be heading, the place where practical user freedoms and developer capabilities meet.
From the Chronicle, a surprisingly good article on Big Data:
This month Mr. Lazer published a new Science article that seemed to dump a bucket of cold water on such data-mining excitement. The paper dissected the failures of Google Flu Trends, a flu-monitoring system that became a Big Data poster child. The technology, which mines people’s flu-related search queries to detect outbreaks, had been “persistently overestimating” flu prevalence, Mr. Lazer and three colleagues wrote. Its creators suffered from “Big Data hubris.” An onslaught of headlines and tweets followed. The reaction, from some, boiled down to this: Aha! Big Data has been overhyped. It’s bunk.
Not so, says Mr. Lazer, who remains “hugely” bullish on Big Data. “I would be quite distressed if this resulted in less resources being invested in Big Data,” he says in an interview. Mr. Lazer calls the episode “a good moment for Big Data, because it reflects the fact that there’s some degree of maturing. Saying ‘Big Data’ isn’t enough. You gotta be about doing Big Data right.”
I don’t know if I have to sketch out the parallels in education, but just in case: we have two really unhelpful parties in learning analytics. We have the “it’s all bunk” crowd, and we have the evangelists. And I don’t know which is worse.
Here’s the thing — saying “Big Data is bunk” is pretty close in ridiculousness to saying “Oceanography is bunk”. Seventy percent of the planet is ocean. Likewise, the “data exhaust” we emit on a daily basis is growing exponentially. There is no future where the study of this data is not going to play a large role in the research we do and the solutions we create. None. Nada.
How we do it is the issue. And the “science” in “data science” is supposed to bring an element of rigor to that.
But for various reasons, the Big Data world is surprisingly unscientific, surprisingly data illiterate, surprisingly uncritical of its own products. I hear supposed data scientists quoting the long debunked claim that Obama won the 2012 election through use of Big Data (unclear and unlikely). They latch on to the same story about Target predicting pregnancy, which remains, years later, an anecdote that has never seen external scrutiny. They cite Netflix, even though Netflix has walked back from a purer data approach, handtooling micro-genres to make results more meaningful.
It gets worse. As the Chronicle article points out, Google statisticians published the original Nature article on using Google searches to predict flu outbreaks in 2009. Google Flu Trends, the result of that research, is used by public health professionals as one measure of likely flu incidence. That persistent over-estimation that was just discovered? Flu Trends was overestimating physician visits by about 100%. That’s bad. But here’s the kicker: It’s been doing that for three years.
And this, unfortunately, is par for the course. A recent article by an Open University analytics expert cites the Purdue Course Signals experiment, apparently unaware that a substantial portion of those findings came under substantial critique last year, which raised questions that have still not been answered to this day. Meanwhile the examples used by MOOC executives are either trivial or so naively interpreted that one has to assume that they are deliberately decieving the public (the alternative, that professors at Stanford do not understand basic issues in research methods, is just too frightening to contemplate). Yet, if they are called on these errors, it is generally by the “Big Data is bunk” crowd.
There are people — people I know personally — who are in the happy middle here, believing not that we should support analytics or Big Data but that we should support better practice, period. But there’s too few of them.
So here’s my proposal. We’ve all used these anecdotes — Google Flu Trends, Course Signals, Netflix recommendations, Obama’s election — to make a point in a presentation.
Early in this field, that was probably OK. We needed stories, and there wasn’t a whole lot of rigorous work to pull from. But it’s not OK anymore. I’m declaring a moratorium on poorly sourced anecdotes. If you are truly a data person, research the examples you plan to cite. See what has happened in the four or five years since you first saw them. Don’t cite stuff that is questionable or seemingly permanently anecdotal. And if you hear people cite this stuff in a keynote, call them on it. Be the skunk at the party. Because it’s intelligent skunks, not cheerleaders, that this field needs right now.
I radically simplified the approach to wiki article reuse. I think for the better. I’d like you to tell me what you think:
Keep in mind this is only the start. The idea would be to build communities around the reuse. So, for example, when your page gets rewiki’d a central system logs that, and feeds back to your page a little snipppet of text that says something like “26 clones, 3 forks” and like Tumblr lists all the different sites that have resused it. You could also create a central hub where the most re-used content of the week floats to the top. Etc., etc.
If you have a Dokuwiki instance and a some hacker blood, you can try it out yourself. Instructions here:
My coding on this stuff is very hacky. I’d love to do this cleanly through XML-RPC in such a way that the only thing you would need is the bookmarklet (and a standard dokuwiki install). If you’re the genius who can make that happen quickly and cleanly, come share the glory!
Thanks to all the people I’ve talked this last iteration through with, but probably especially Devlin Daley who helped me stumble toward what I wanted to do during an hour long videochat. I came out of it with a clearer sense of what the core product was.
Ages ago in MOOCtime there was this media think-nugget going around about the glories of Big Data in MOOCs. It reached its apex in the modestly titled BBC piece “We Can Build the Perfect Teacher“:
One day, Sebastian Thrun ran a simple and surprising experiment on a class of students that changed his ideas about how they were learning.
The students were doing an online course provided by Udacity, an educational organisation that Thrun co-founded in 2011. Thrun and his colleagues split the online students into two groups. One group saw the lesson’s presentation slides in colour, and another got the same material in black and white. Thrun and Udacity then monitored their performance. The outcome? “Test results were much better for the black-and-white version,” Thrun told Technology Review. “That surprised me.”
Why was a black-and-white lesson better than colour? It’s not clear. But what matters is that the data was unequivocal – and crucially it challenged conventional assumptions about teaching, providing the possibility that lessons can be tweaked and improved for students.
The data was unequivocal. But was the truth it found durable? I’ve argued before that the Big Data truth of A/B testing is different from the truth of theoretically grounded models. And one of the differences is durability. We saw this with the A/B testing during the Obama campaign, when they thought they had found the Holy Grail of campaign email marketing:
It quickly became clear that a casual tone was usually most effective. “The subject lines that worked best were things you might see in your in-box from other people,” Fallsgraff says. “ ‘Hey’ was probably the best one we had over the duration.” Another blockbuster in June simply read, “I will be outspent.” According to testing data shared with Bloomberg Businessweek, that outperformed 17 other variants and raised more than $2.6 million.
The “magic formula”, right? Well, no:
But these triumphs were fleeting. There was no such thing as the perfect e-mail; every breakthrough had a shelf life. “Eventually the novelty wore off, and we had to go back and retest,” says Showalter.
And today there is news that the “Upworthy effect” — that A/B tested impulse to click on those “This man was assaulted for his beliefs. You won’t believe what he did next.” sort of headlines — is fading:
[Mordecai] lets everyone in on his newest data discovery, which is that descriptive headlines—ones that tell you exactly what the content is—are starting to win out over Upworthy’s signature “curiosity gap” headlines, which tease you by withholding details. (“She Has a Horrifying Story to Tell. Except It Isn’t Actually True. Except It Actually Is True.”) How then, someone asks, have they been getting away with teasing headlines for so long? “Because people weren’t used to it,” says Mordecai.
Now, Upworthy is an amazing organization, and I’m pretty sure they’ll stay ahead of the curve. But they are ahead of the curve precisely because they understand something that many Big Data in Education folks don’t — the truths of A/B are not the truths of theory. Thrun either believed or pretended to believe he had discovered something eternal about black and white slides and cognition. Which is ridiculous. Because the likelihood is he discovered something about how students largely fed color slides reacted to a slideset strangely reduced to black and white.
Had he scaled that truth up and delivered all slides in black and white he would have found that suddenly color slides were more effective.
There’s nothing wrong with this. Chasing the opportunities of the moment with materials keyed to the specific set of students in front of you is worthwhile. In fact, it’s more than worthwhile; it’s much of what teaching is *about*. Big Data can help us do that better. But it can only do that if we realize the difference between discovering a process that gets at eternal truths vs. discovering a process that gets at the truth of the moment.