Synchronous Online is About To Get Much, Much Better

Synchronous Online Sucks

Synchronous online — the twisted mess of chat-rooms, video-conferencing, and screen-sharing we use for real-time online education — has sucked for a while now. To be sure we have products: it’s rare for a university to not have *some* web video-conferencing solution in place. And from what I understand, these products, at the enterprise level, are not cheap. There is commitment to supporting *some* level of functionality here.

But almost all of these products are designed for “pass-the-mic” lecture style classrooms, or worse, are sales presentation software that has been “adapted” for the classroom. And hence, all the techniques to which professors normally have access in face-to-face discussion (see, for example, the excellent work of Stephen Brookfield on structured peer discussion) are not really possible with modern video conferencing software.

But it doesn’t have to be this way. I’ve talked before about ideas like Ed Roulette, which attempts to structure online video conferencing using a peer instruction paradigm. That’s the tip of the iceberg. In a conversation with Devlin Daley at Open Ed, we ran through the variety of structured discussion styles used in the modern classroom, from the “speed dating” peer-review exercises I have my students do to get reader feedback to the Circular Response-style discussions that remain a favorite way to stop students from talking past one another.

All of these techniques can revolutionize your classroom. None of them are available to you online.

…But This Will Change This Coming Year

Why don’t we have technical options that borrow from the structure of the above models? Options that not only provide an opportunity to talk to one another, but also help structure the discussion? Part of it is the lack of interest in the synchronous online environment as a market. As Amy Collier noted to me in a recent conversation, up until recently online has been seen as a low-quality solution for people who do not have the time flexibility for scheduled class meetings, and as such the synchronous element has lagged behind the asynchronous piece. Who chooses an online class based on the synchronous experience, right?

But perhaps an even greater barrier has been the cost of serving real-time video. It’s huge. It requires a large, stable infrastructure and a sizable investment to make it work. Got an idea you’d like to try with online video? Maybe you want integrate deliberative polling into a issues discussion? Maybe you want a product that sets up project groups and allows you as the teacher to drop into them, annotate them, or put them on a timer. Great! All you need to get started is a server farm and a T1T3 line. Go to it!

Both these issues have changed in the past couple of years. On the culture side, we are increasingly seeing online as an option for students who are on-campus, and can commit to scheduled online meetings.

On the technology side, peer-to-peer web video is about to change the world. Following up a tip from both Devlin and Tim Owens, I recently looked into the new WebRTC technology/spec that is built into newer browsers. This technology allows programs to negotiate a peer-to-peer video connection, without having to serve up the actually video. You write the programming, and the individual participant computers (and browsers) handle the streaming. A great example of this is, a service that lets you set up a custom video chatroom in one step. Try it out — I’ve found the quality to be amazing, superior to what I get with Skype or Google Hangouts.


With the introduction of peer-to-peer video capabilities in newer browsers, much of the previous bandwidth barriers to innovation are disappearing — the “software-plus-infrastructure” problem of video-based synchronous online is quickly becoming a software-only problem. The market for synchronous online experiences is likely to grow. It’s hard to see this as anything but a perfect storm for rethinking the gun-metal-gray boredom of those Adobe Connect sessions we keep pushing on students. Hopefully we can replace it with something that allows us to mimic some of the fluid structures of the face-to-face class; at the very least we can get past the outdated assumptions still present in most of this software.

(Educational) Research Isn’t Broken But the Culture Is

Great post today out on Simply Statistics. In it, the author critiques the claim that most research is false, finding claims of a reproducibility crisis are probably overstated at this point, but concluding the following steps are still necessary:

We need more statistical literacy
We need more computational literacy
We need to require code be published
We need mechanisms of peer review that deal with code
We need a culture that doesn’t use reproducibility as a weapon
We need increased transparency in review and evaluation of papers

I’d agree with this. I find the points about culture particularly important. One of the sad things about about the Course Signals situation was the reaction showed how incapable the current system is of having a debate about *numbers*, with everyone immediately retreating to narrative (or retreating to silence).

If you’re not willing to show your work, you’re not in research. But if your analysis of statistical or computational error is “Ha, ha, you’re wrong!” you’re not in research either. Both tendencies are toxic, and both issues play off one another in a culture that increasingly awards people no benefits for openness but exposes them to a lot of professional risk. The smart move for any researcher today — in climate science, education, or anything of social import — is to make it as difficult as possible to dig into the work, process, and numbers behind their results. (Don’t believe me? Just ask Michael Mann.)

If we believe educational research does matter (and it does), we need to lose the combative attitude about results that don’t support our views, and we need to open up results that do to criticism. That requires a culture that takes a joy in geeking out about the numbers before jumping into tried and true narratives; we seem to be getting further from that every day.

Use of MOOC Community Features in Blended Scenarios, Dan Ariely Edition

As readers of this blog know, Amy Collier and I have been making a year-long argument that MOOC community features, as currently designed, are often perceived by blended students as low-to-no-value substitutes for local interaction. That made this snippet of MOOC-runner Dan Ariely talking about his own class’s use of the MOOC rather interesting:

Dan Ariely:  …Let me tell you one more thing. I spend a lot of time creating these classes online, and then I try to use two of them in my regular face-to-face class. And I did what is called “reverse classroom,” so I asked the students to watch the video at home and then come to class to discuss it. It was incredibly successful. I think the students enjoyed the videos and then they enjoyed the discussion. The other thing I tried to do, again with my face-to-face class, was to get them to watch the videos and to have all the discussion about it online. And that one was not as useful.

So the students in my regular class basically had three versions: they had me in person; they had watched the video and come in to have just the discussion; and they watched the video and had the discussion online. And they basically rated them in that order. They said it was the most useful to have me in class. Not too far from that is to watch the video of the material and then have the discussion in class. Much less appealing was to have the video and then have an online discussion on that.

That’s part of the story — that you are losing something as you become more detached from the students and have less face-to-face time, but you have to figure a version of how this could work out because of the cost of the full-time colleges and universities.

Now caveats, caveats: Dan is an accomplished teacher in a face-to-face setting, and probably a much less accomplished online moderator. So maybe part of what we are seeing here is just that Dan’s strength in explanation can be captured online, but his strength in facilitation can’t.

Here’s the thing though — this is true of most face-to-face teachers. This is what they’ve become really good at,  year after year, and it’s the value they bring to the classroom. So any system of teaching that takes takes that core talent and throws it out the window isn’t likely to have a great success rate, no?

I think you can also argue that we privilege face-to-face communication more, assuming that it’s not inconvenient, and I think you can argue that online facilitation has to become a more common skill over time. But I’ve made enough points for today, so I’m done.

Can People Designing Multiple Choice Tests for MOOCs Please Study Designing Multiple Choice Tests?

David Kernohan has a post up titled How I got a “first” on a FutureLearn MOOC with one weird old trick… over at FOTA, and it does just what it says on the tin. In the post, David details how he was able to get an 87.4% on a FutureLearn test for a course module without studying any materials. The “old trick” referenced is not actually one old trick, but the bag of tricks that most people come to learn taking multiple choice tests. You know, for example, that when “all of the above” appears, there’s a high chance that’s the correct option. You know that sentences that are not parallel in structure are usually wrong options. You know the longest, most qualified answer is usually right.

I’ve talked about this before, regarding a Coursera course my boss took, writing a scathing treatment of it in Sometimes Failure is Just Failure. In that case the malformed questions did not make it easy for a person with no knowledge to ace the test, but rather, it made it difficult for someone with good knowledge to pass it.

The reason these sorts of errors are so mind-blowing frustrating is:

  1. You see it everywhere, in all these products.
  2. It’s a really serious matter.
  3. It could be solved with a four-page checklist.

It’s really hard to write excellent multiple choice questions, questions that go beyond simple recall and comprehension and test higher order, conceptual understanding. On the other hand, many decades of research have made writing good multiple choice questions relatively easy. You don’t even have to read the research — you just need to follow the rules distilled from the research.

I don’t know if the problem in this case was a FutureLearn designer or a subject matter expert they worked with. Perhaps one of the problems is that it’s unclear in the arrangement who exactly is responsible for test validity. But somewhere in the quality assurance chain there needs to be someone who can read the checklist and spend the amount of time necessary to reformat these questions. We can debate all day whether multiple choice is the best way to assess students, but making sure the multiple choice questions we do use are well designed is a no-brainer. Do like I do, and don’t trust your instincts. Print out the sheet and keep it by you while you write your questions. Check each question. Then get someone else to double check it for you.

When you’re rolling this out to 100 students, this sort of care is important. When you’re rolling it out to tens of thousands, this sort of care is essential.

Why the Why Matters

A quick follow-up to yesterday’s post on the supposed “death of theory” and its relation to MOOC research — the story thus far is that a number of people sincerely think the “why” doesn’t matter if our sample is big enough and the variables tracked are numerous enough. Here’s a typical quote from Thrun:

One day, Sebastian Thrun ran a simple and surprising experiment on a class of students that changed his ideas about how they were learning.

The students were doing an online course provided by Udacity, an educational organisation that Thrun co-founded in 2011. Thrun and his colleagues split the online students into two groups. One group saw the lesson’s presentation slides in colour, and another got the same material in black and white. Thrun and Udacity then monitored their performance. The outcome? “Test results were much better for the black-and-white version,” Thrun told Technology Review. “That surprised me.”

Why was a black-and-white lesson better than colour? It’s not clear. But what matters is that the data was unequivocal – and crucially it challenged conventional assumptions about teaching, providing the possibility that lessons can be tweaked and improved for students.

Note that last bit — “What matters is that the data was unequivocal”. This is how the End of Theory position appears in print. We don’t know *why* the students did better, but they did better, and the data was so “big” that that’s all that matters.

But the why does matter. Because without the why you can’t generalize from one situation to the next, and you keep repeating the same mistakes. In this case, we could hypothesize three alternate explanations of the phenomenon:

  1. The lack of colors leads to a lack of distraction. Students watching the colored slides were processing the colors as meaningful (when they weren’t) and this was subtly hindering their comprehension and recall.
  2. Students had a harder time reading the black and white slides. There’s recent research that indicates slight disfluencies in presentation can be desirable (jokingly dubbed “The Comic Sans Effect”), and these disfluencies aid in recall.
  3. The cause was not that the slides were black and white, but that the black and whiteness was novel. We know from previous psychological research that the mind attends to novelty, the greater attentiveness led to greater retention.

So which one is it? Udacity doesn’t care. But they should. Because if it’s the second one, then writing slides in black and white is not exactly what you should be focused on. And if it’s the third — the novelty effect — then the impact of this is going to be very limited. This isn’t even getting into variable context, pedagogical aims, or path-dependence.

You see this problem in Big Data all the time. The Obama campaign was really a Small Data operation, but they did extensive A/B testing. And what they found one night was the Holy Grail of campaign email marketing:

It quickly became clear that a casual tone was usually most effective. “The subject lines that worked best were things you might see in your in-box from other people,” Fallsgraff says. “ ‘Hey’ was probably the best one we had over the duration.” Another blockbuster in June simply read, “I will be outspent.” According to testing data shared with Bloomberg Businessweek, that outperformed 17 other variants and raised more than $2.6 million.

The “magic formula”, right? Well, no:

But these triumphs were fleeting. There was no such thing as the perfect e-mail; every breakthrough had a shelf life. “Eventually the novelty wore off, and we had to go back and retest,” says Showalter.

Is this what is happening with Udacity’s black and white slides? Are the eternal truths they are unearthing merely statistically significant fleeting effects?

Udacity doesn’t care. But they should.

Stocks, Flows, and the 80% Non-traditional Figure

At the MOOC Research Initiative conference Jeff Selingo gave what I thought was a capable presentation of the current landscape of higher education. People might quibble with a point or two, but overall it was a relatively balanced, hysteria-free overview of a market which is not necessarily “broken”, but is poised to undergo some relatively dramatic changes in the coming decade or two.

One stat from that talk bothered me though, and seeing it pop up again on Twitter today pushed me to try to explain the problem:


You hear variations of this stat all the time, and there’s no doubt in my mind we are undergoing a transformation of our student body. But here’s the important piece of information I never hear attached to such stats — is it a stock or a flow?

You see, there’s a couple ways to measure a phenomenon like this, and they have their strengths and weaknesses, and sometimes can give very different views of the data. The first way — measure the stock — is the first approach of most people. Is thyroid cancer a growing problem in a America? Control for population, compare the number of people with thyroid cancer with the number 50 years ago, and there you go — sorted.

But is the number of people living with thyroid cancer what you care about? Or is the number of people contracting thyroid cancer what you care about?

Those things sound like they’d give very similar views, but they don’t. Thyroid cancer used to be a uniformly deadly disease. People who contracted it would usually die, and die relatively soon after a diagnosis, so at any given time there was only a small population of people who had thyroid cancer. With the introduction of radioactive iodine as a treatment, the prognosis for most people with thyroid cancer today is quite good. Since cancer is considered a chronic disease (a person who contracts thyroid cancer will always be considered to have it, even in remission) the number of people who now have thyroid cancer has gone through the roof, but as a result of our success, not failure.

That’s why epidemiology has two different measures — one is prevalence, our measure of the “stock”. Prevalence asks how many people at a given time have a condition. But they also use an “inflow” measure: incidence. Incidence measures how many people per year move into a condition. Which measure we use will depend on what we aim to do with the information, but it is almost always appropriate when given one number to ask for the other as well. If the numbers tell two very different stories it’s time to sit down and parse out why.

So what about this 80% of students are non-traditional? Is that a stock or a flow?

My guess is it’s a stock, more akin to measures of prevalence than incidence (Apologies to Jeff if I am not right here). And if it is, that’s an interesting statistic, but I’m not sure it’s the right one to use. Here’s a simplified model to show why. Imagine a world that graduates two people from high school every two years. One always goes to college full-time and gets out in just under four years (just under because they graduate in May, not September). The other student goes part time, and finishes in eight years. Here’s a visual representation of how that plays out over time:


The blue arrows represent the tenure of our full-time students in college, and the red arrows the time spent by our part-timers.

Now here’s a question. Go to year eight and run down through the students that line bisects. Count up the students that are either starting college or currently enrolled. You’ll notice that there are two four-year students currently enrolled and four eight-year students. Now tally up your percentages. In this case we reach the conclusion that 66% of students are non-traditional eight-year students, compared to the measly 33% of the population who are in four-year programs. The “new normal” right?

But measures of flow tell a different story. In each graduating class, 50% of students go traditional, and 50% non-traditional. And asked whether flow or stock best represents what is a normal student experience here, I’d have to say it’s the measure of flow (50%). The stock measure is just too prone to outflow effects.

Of course, these are made up numbers. It could go the other way as well — if part-timers don’t persist at the rate of full-timers, the effect would run the other way, significantly minimizing the number of students who choose to attend part-time. Or perhaps the two trends balance one another out. The smaller point is that this discussion needs better numbers if we are to get an accurate sense of what is going on. The larger point is when presented with stats on a stock, always seek out a corresponding flow, and vice versa. If they give you deficit, ask for debt. If they give you number of people imprisoned, ask for a count of yearly incarcerations. And so on. The interesting story is usually where different measures seem to disagree, and you can’t get that from a single view.

Short Notes on the Absence of Theory

Martin Weller, Stephen Downes, and Matt Crosslin have been kicking around the “post-theory” critique of MRI ’13 that came up in a discussion Jim Groom and I had Thursday night in the middle of a bar in the middle of a hotel in the middle of an ice storm.

I thought I might just add a bit of context and my two cents.

First, the conversation came up because Jim was quite nicely (and genuinely) asking an edX data analyst what Big Data was. The answer that analyst gave was that Big Data was data that was big. That’s actually technically correct — the original term was meant to refer to data that was big enough in terabytes/petabytes that it could not be processed through traditional means. If your data was big enough that you were using Hadoop, it was Big Data.

Because I’m generally a person that can’t keep my mouth shut, I interjected that while that was true from a technical standpoint, it didn’t really get at the cultural significance of the Big Data movement, which was captured in Chris Andersen’s “End of Theory” article back in 2008. Here’s a sample:

Google’s founding philosophy is that we don’t know why this page is better than that one: If the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required. That’s why Google can translate languages without actually “knowing” them (given equal corpus data, Google can translate Klingon into Farsi as easily as it can translate French into German). And why it can match ads to content without any knowledge or assumptions about the ads or the content.

Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

While my analogy-prone brain sees parallels here to Searle’s Chinese Room problem, it’s probably more correct to see this as behaviorism writ large: where Skinner wanted us to see the mind as a black box determined by inputs and outputs, Big Data asks us to see entire classes of people as sets of statistical probabilities, and the process of research becomes the iterative manipulation of inputs to achieve desired outputs. And the same issues emerge: Chomsky’s “destruction” of behaviorism in his 1959 takedown of B. F. Skinner’s Verbal Behavior is generally overstated, but certain passages in that work seem a relevant critique of the “end of theory”; for instance, where Chomsky criticizes Skinner’s notion of reference: “The assertion (115) that so far as the speaker is concerned, the relation of reference is ‘simply the probability that the speaker will emit a response of a given form in the presence of a stimulus having specified properties’ is surely incorrect if we take the words presence, stimulus, and probability in their literal sense.”

Of course, in the past 50 years we’ve seen this Chomsky-Skinner drama played out anew in linguistics. While Chomsky’s transformational grammar underpinned efforts at computer translation for many years, Google’s translational approach, which sees language as nothing more than a set of probabilities (words are “known” to be the same in two different languages if they have the same probability of occurring in a context), is quickly outstripping the traditional methods. In fact, for a certain class of tasks it becomes increasingly obvious that correlation *is* enough. Google’s translation engine has little to no theory of language, yet adequately serves for a person who needs a quick translation of a web page. And that somewhat atheoretical nature of the engine is in fact its strength — Google’s approach needs only a robust set of web pages from any language to generate the correlations needed to start translating.

So this debate is not really new, and there’s certainly a place for this sort of radical pragmatism. Chomsky’s focus on a system of mental rules that form a universal grammar may have enlarged human knowledge, but it’s turning out to be a really inefficient way to train computers to understand language. Gains in understanding underlying models are not always the shortest route to efficacy.

But such approaches come with a down side as well. Morozov deals with this extensively in his book To Save Everything, Click Here, and in his WSJ review of the book Big Data. After noting that Big Data is very useful in situations where you don’t care what the cause is (Amazon cares not a whit *why* people who buy german chocolate also buy cake pans as long as they get to the checkout buying both), where you do care about cause things are a bit different:

Take obesity. It’s one thing for policy makers to attack the problem knowing that people who walk tend to be more fit. It’s quite another to investigate why so few people walk. A policy maker satisfied with correlations might tackle obesity by giving everyone a pedometer or a smartphone with an app to help them track physical activity—never mind that there is nowhere to walk, except for the mall and the highway. A policy maker concerned with causality might invest in pavements and public spaces that would make walking possible. Substituting the “why” with the “what” doesn’t just give us the same solutions faster—often it gives us different, potentially inferior solutions.

A hardline proponent of a Big Data approach might object to Morozov that you just need more nuanced and informed correlations. But assuming you had no theory about of ultimate causes, how would you even conceive of the possibility? (This is similar to what Michael Feldstein was getting at in his piece about the inadequacy of Big Data for education). A person who does not have a model of what is happening is unlikely to know where to look for inconsistencies. And Big Data is, by definition, big. Theory is your roadmap.

This is why at the workshop on analytics at the conference, I insisted on the “grokability” of analytics-produced guidance to the people who would use it to help students. In a way it comes down to the empowerment of the practitioner (and of the student). If I’m told I have a 50% chance of dropping out based on my “rt-score” of 2145.7, that’s one thing. But the interpretation of what to *do* about that number should depend heavily on what the inputs into it were. Was it prior GPA that pumped that score so high, or socioeconomic status? And the reason those variables are treated differently is that we have models and theories about socioeconomic status and GPA that help us understand its significance as a predictor.

Ultimately, like so many in the field, I’m actually very excited about the promise of data (though I would argue that it is actually “small data” — data that can live in a single spreadsheet — that paired with local use has the greatest potential). Still, if we are to enter this world we have to understand the trade-offs we engage in. Most of the theory-bound could certainly use a better understanding of how powerful a tool statistics can be in overcoming our own theoretical predispositions. It’s useful to understand that theory is not the only tool in the toolbox. But it’s equally true that the new breed of data scientist needs to be far more acquainted with the theories and assumptions that animate the sets of data in front of them. At the very least, they have to understand what theory is good for, why it matters, and why it is not always sufficient to tweak inputs and outputs.