From Precinct to Voter

A summary of some reading from an old Wikity page. One way of thinking about current political trends is to see them as continuations of of trends brought about by other channels and uses of data dating back to the 1960s. In this telling, data and direct access to voters first erodes the precinct level in favor of national analysis and contact campaigns, available only through national party infrastructure. The commodification and availability of these channels online now starts to erode the power of parties as well.

Original article follows

——————-

Early political processes focused on precincts and wards as the unit of allocating campaign effort. You would get out the vote in the areas where you had broad support, and sometimes, less honorably, suppress the vote in those places where you didn’t.

As canvassing became supplemented with phone calls, direct mail, and other pieces, the unit increasingly became the individual voter. Campaigns were less concerned about getting out the vote of Ward 8 and more concerned about flushing out demographics such as “College educated under-25s”

As early as the 1960s Democrats began systematically assessing which precincts should be allocated campaign resources using statistics aggregated over fairly wide geographic areas. By the 1990s, the precinct was being supplanted by the individual voter as the unit of analysis, just as wall maps and clipboards were giving way to web applications and Palm Pilots. (Source)

As campaigns became more focused on these units of analysis, many traditional GOTV efforts became less privileged. And even where traditional methods were used, they were used under the guidance of the new approach — a ward captain in 1960 would get out the vote for his or her ward; by the 2000s they would be armed with data on what specific doors they needed to knock on, based on likely percentages of support and number of previous touches tracked.

Computational Propaganda and Totalitarianism (A Thread)

It Can Take As Little As Thirty Seconds, Seriously

 

I talk about 90-second fact-checks and I think people think I’m a bit unhinged sometimes. What can students possibly do in that short amount of time that would be meaningful?

A lot, actually.

For example, this press release on some recent research was shared with me today:

eureka alert.PNG

Now I want to re-share this with people, but I’d like to be a good net citizen as well. Good net citizens:

  • Source-check what they share
  • Share from the best source possible
  • Provide source/claim context to people they share with when necessary

To do that in this case we need to get to the source of the press release, on a site controlled by the American Psychological Association directly, and share that version of this. We also need to check that the American Psychological Association is the credible organization that we think it is. How long will this take?

Literally thirty seconds, if you know how to do it:

  • Select the headline, search on it.
  • First result up is from apa.org, that looks promising
  • Go there, look to make sure it’s the same release
  • Search Wikipedia for the site address. Find the article on the APA.
  • Check to make sure the APA is a real organization.
  • Check to make sure the APA web address matches

And you’re done. That may sound like a lot of steps, but each one is simple, fast, and fluid. Here are those steps executed in real time (video intentionally silent). I really encourage you to watch the video to see how ridiculously easy this is for someone with some training.

There’s really no excuse not to do this for things you share. It not only allows you to share from a more authoritative source, which is good for society and the economics of publishing, but it allows you provide your readers helpful context. Compare this:

aaas.PNG

To this:

apa2.PNG

I used to focus on students on writing longer research pieces on issues, and we still do that in various classes we work with. But just this behavior alone improves the world:

  • Check what you share
  • Share from the better source
  • Provide a context blurb to share your own source verification with others

You don’t need to write an essay. And most any student (or teacher!) can learn the techniques. Think of it as information hygiene, the metaphorical handwashing you engage in to prevent the spread of misinformation.

Learn the skills and make the world a better place. There may be good excuses for not doing this, but time is not one of them.

(Oh, and here’s that APA press release — it’s really interesting!)

 

Instead of letting people vote on news, Facebook should adopt Google’s rater system

A message I sent to a newsgroup on Facebook’s recent proposal that users could rate sites as a solution. To my surprise, I find myself suggesting they should follow Google’s model, which, while often faulty, is infinitely better than what they are proposing.

==============

(Regarding the announcement), I think there’s a better, time-tested way of doing this that doesn’t deal with individual ratings but benefits from expert analysis and insight. Use a modified version of the Google system.

Most people misunderstand what the Google system looks like (misreporting on it is rife) but the way it works is this. Google produces guidance docs for paid search raters who use them to rate search results (not individual sites). These documents are public, and people can argue about whether Google’s take on what constitutes authoritative sources is right — because they are public.

The raters rate search quality off the documents, and coders try to code to get the score up, but the two pieces are separate.

It’s not a perfect system, but it provides so many things this Facebook proposal doesn’t even touch:

  • It defines a common set of standards as to what a “good” result looks like, without going into specific sources.
  • It provides a degree of public transparency that Facebook doesn’t even come close to
  • It provides incentives for publishers to act in ethical ways, e.g. high quality medical or financial advice (“your money or your life” categories) must be sourced to professionals, etc.
  • It separates the target from the method of assessing whether the target is being hit
I’m not saying it doesn’t have problems — it does. It has taken Google some time to understand the implications of some of their decisions and I’ve been critical of them in the past. But I am able to be critical partially because we can reference a common understanding of what Google is trying to accomplish and see how it was falling short, or see how guidance in the rater docs may be having unintended consequences.

In such a system Facebook would hire raters who would rate feed quality — not individual sites — for a variety of criteria which experts have decided is characteristic of quality news feeds (and which readers by and large agree with). That would probably mean ascertaining whether sites included in the feed had the following desirable attributes:

  • Separation of opinion and analysis and news content, with opinion in particular clearly marked
  • Sponsored content clearly marked, and comprising small portion of overall site
  • Syndicated content clearly identified
  • Satire pieces marked unmistakably as satire
  • News stories clear about the process and methods by which they verified news in an article (e.g. “Kawczynski declined to be interviewed Sunday, but in posts on his website and on Gab…”)
  • A retraction policy, and an email address to send noted errors to
  • Headlines that match in tone and meaning the content of the attached article
  • Descriptive blurbs in Facebook that accurately describe the content of the article
  • Pictures which are either related to the event or marked as stock or file footage with descriptive and accurate captions
  • Links where appropriate to supporting coverage from other news outlets
  • A clear and accurate about page which defines who runs the paper
  • A lack of plagiarism — e.g. does the content pass the “Google paste test” for material not marked as syndicated.
Raters would rate the quality of the sources showing up in their feed and Facebook engineers would work on improving feed quality by getting the ratings up. No one gets banned or demoted by name. Or promoted by name.
The place of experts and the public would be to clarify what they trust in news. In fact, the Trust Project has already done much of the work that would go into feed quality rating docs. I summarize their work and my simplification of it here:
Again, the rater guidance documents get published. We continue to argue over whether the guidance is correct and whether the implementation is meeting the guidance or being gamed. We still raise holy hell about misfires and get them to rethink guidance and code.
The approach Facebook is currently proposing, on the other hand, is essentially nihilistic, and like many nihilistic things it may have current utility (and may even work temporarily), but provides a lousy foundation for dealing for problems to come.
Mike.

P.S. By and large I think you will find both that the public would rather trust expert opinion on what  constitutes quality than trust their neighbor, and that the public more or less agrees — both right and left — with the practices in the bulleted list above.

Arsonist Birds Activity Sequence

I have a new and complete three-part activity sequence up on the Four Moves blog. It asks students to delve into whether a story about birds intentionally setting fires is an accurate summary of research cited. It goes through three steps:

  • Evaluating the reporting source.
  • Evaluating the research source.
  • Checking for distortion of the source material.

I won’t go into the steps because I don’t want to foul up the Google Search results for this activity. But I encourage you to start at the beginning and go through the three activities. (Please note that it is meant to be used in a classroom with a facilitator.)

One of the things I’ve built out a bit more in this set of activities is the “skills up front, social impact on the back” approach. Each activity runs the students through very specific skills, but then asks the students to reflect a bit more deeply on the information environment as a whole. Here are some discussion questions we asks students after they check whether the National Post is a “real” newspaper:

  • Neither the National Post nor the reporter has any core expertise in ethno-biology. So why do we trust the National Post more than a random statement from a random person?
  • Why do you think that newspapers have such a good reputation for truthfulness and care compared to the average online site? What sort of economic incentives has a newspaper historically had to get things right that a clickbait online site might not have had?
  • How do we balance our need for traditional authoritative sources with our desire to include diverse voices and expertise? How do we make sure we are not excluding valuable online-only sources? What are the dangers of a newspaper-only diet of news?

And here is a question we ask after we have the students read the arsonist birds article — which is really about science having ignored indigenous and professional expertise:

  • One question the journal article raises is the way that professional and indigenous expertise is not always valued by science. How can we, as people seeking the best information, value academic research while respecting non-academic expertise when appropriate? What’s a good example of when professional or indigenous expertise on an issue might be preferable to academic expertise?

This stuff takes forever to put together, unfortunately, because one thing we’re trying to do is be very careful about tone, and make sure we get students to think about the incentives around information production without allowing them the easy shortcut of cynicism. We also are quite aware that the biggest worry we face is not that students will consume disinformation, but that they may consume no civically-oriented news at all. So in other sections we use the follow-up to make the case for considered and intentional news consumption (and again, news consumption that is less focused on political hobbyism).

In any case, I think it’s a solid sequence, and I hope you’ll try going through it. It uses a password for the “solutions” as a way to rebuff search engines and slow students down. The password is “searchb4share”. Try it out!

 

People Are Not Talking About Machine Learning Clickbait and Misinformation Nearly Enough

The way that machine learning works is basically this: you input some models, let’s say of what tables look like, and then the code generates some things it thinks are tables. You click yes on the things that look like tables and the code reinforces the processes that made those and makes some more attempts. You rate again, and with each rating the elements of the process that produce table-like things are strengthened and the ones that produce non-table-like things are weakened.

It doesn’t have to be making things — it can be recognition as well. In fact, as long as you have some human feedback in the mix you can train an machine learning process to recognize and rate tables that another machine learning process makes, in something called a generative adversarial network.

People often use machine learning and AI interchangeably (and sometimes I do too). In reality machine learning is one approach to AI, and it works very well for some things and not so well for others. So far, for example, it’s been a bit of a bust in education. It’s had some good results in terms of self-driving cars. It hasn’t done great in medicine.

It will get better in these areas but there’s a bit of a gating factor here — the feedback loops in these areas are both delayed and complex. In medicine we’re interested in survival rates that span from months to decades — not exactly a fast paced loop — and the information that is currently out there for machines to learn from is messy and inconclusive. In learning, the ability to produce custom content is likely to have some effect, but bigger issues such as motivation, deep understanding, and long-term learning gains are not as simple as recognizing tables. In cars machine learning has turned out to be more useful, but even there you can use machine learning to recognize stop signs, but it’s a bit harder to test the rarer and more complex instances of “you-go-no-you-go” yielding protocols.

You know what machine learning is really good at learning, though? Like, scary, Skynet-level good?

What you click on.

Think about our tables example, but replace it with headlines. Imagine feeding into a machine learning algorithm the 1,000 most shared headlines and stories, and then having the ML generate over the next hour 10,000 headlines that it publishes by 1,000 bots. The ones that are successful get shared and those parts of the ML net are boosted (produce more like this!). The ones that don’t get shared let the ML know to produce less along those lines.

That’s hour one of our disinfo Skynet. If the bots have any sizable audience, you’re running maybe 20,000 tests per piece of content — showing it to 20,000 people and seeing how they react. Hour two repeats that with better content. By the next morning you’ve run millions of tests on your various pieces of content, all slowly improving the virality of the material.

At that scale you can start checking valence, targeting, impact. It’s easy enough for a network analysis to show whether certain material is starting fights for example, and stuff that starts fights can be rated up. You can find what shares well and produces cynicism in rural counties if you want. Facebook’s staff will even help you with some of that.

In short, the social media audience becomes one big training pool for your clickbait or disinfo machine. And since there is enough information from the human training to model what humans click on, that process can be amplified via generative adversarial networks, just like with our tables.

It doesn’t stop there. The actual articles can be written by ML, with their opening grafs adjusted for maximum impact. Videos can be automatically generated off of popular articles and flood YouTube.

Even the bots can get less distinguishable. An article in the New York Times today details the work being done in ML face generation, where believable fake faces are generated. Right now the process is slow, partially because it relies solely on GAN, and because it’s processor intensive. But imagine generating out a 1,000 fake faces for your bot avatars and tracking which ones get the most shares, then regenerating a thousand more based on that and updating. Or even easier, autogenerating and re-generating user bios.

You don’t even need to hand-grow the faces, as with the NYT article. You could generate 1.000 morphs, or combos of existing faces.

Just as with the last wave of disinformation the first adopters of this stuff will be the clickbait farms, finding new and more effective means to get us to sites selling dietary supplements, or watch weird autogenerated YouTube videos. There will be a flood of low-information ML-based content. But from there it will be weaponized, and used to suppress speech and manipulate public opinion.

These different elements of ML-based gaming of the system have different ETAs, and I’m not saying all of this is imminent. Some of it is quite far off. But I am saying it is unavoidable. You have machine learning — which loves short and simple feedback loops — and you have social media, which has a business model and interface built around those loops. The two things fit together like a lock and a key. And once these two things come together it is likely to have a profoundly detrimental effect on online culture, and make our current mess seem quite primitive by comparison.

 

 

 

Some 2018 Predictions

wrote a prediction a couple weeks ago for Nieman Lab, and it was general and media literacy focused. But here’s some more mundane, somewhat U.S.-centric predictions:

  • Social media overrun by AI. AI’s main influence in the coming year will not be in driving cars but in driving state-sponsored social bots and corporate astroturfing. The ability of AI to manufacture real looking discourse and news will render “representativeness” heuristics for misinformation obsolete.
  • Revenge porn in politics. Revenge porn — using both real and fake photos — has been used in multiple international elections. We saw a hint of what can happen here with the Joe Barton incident, and media failed that test. But prepare for much more strategic use, along the lines of what we saw recently in Rwanda. Prepare for at least one U.S. 2018 race to have a revenge porn “scandal”, maybe more. Also, look to AI — with its ability to modify video and images — to play a possible role. (Also, can we come up with a different name for this trend? Revenge porn is a specific form of malinformation/misinformation, but “porn” is wrong since titillation is a mechanism, but the intent is personal destruction and silencing of opponents).
  • Hacked voter registration rolls. Some state voter registration systems will be hacked in an effort to create chaos and long lines at the polls, depressing the vote and calling the results of elections into question. Social media bots and armies will play both sides of this issue — with the mismatches portrayed as voter fraud on one side and suppression on the other.
  • Rise of the civil servant email dump. The press were willingly complicit in the Wikileaks email dump, covering and sifting through stolen personal, non-state emails to amplify a Russian disinfo campaign. Until we come up with a “right to privacy” for state officials and workers this will be a profitable vein. Imagine dumping the personal emails of a state-level elections official where they talk mean about one of the candidates, or share an anti-Trump meme with their spouse. Or an anti-Democratic meme. Apply this to a judge faced with adjudicating a case where the government is a plaintiff, or anyone who threatens your agenda.
  • Big Oil trolling. A major petrochemical company will be revealed to be using troll armies to create confusion around climate change, perhaps with some state help.
  • Creation of pro-government social media army focused domestically. My most out-there prediction. President Trump will announce the creation of a “Fake News Commission” to investigate both journalists and social media. One finding of the committee will be that the U.S. needs to emulate other countries and create an army of social media users to seek out anti-government “fake news” and “correct” it. (There is some precedent for the U.S. doing this in other countries, but we have never done it to our own population at any serious scale).
  • Weaponization of #MeToo. The #MeToo movement is an important and long overdue movement. But because of its disparate impact on the left it will be weaponized, and used to try to fracture the left. This will be one of the big uses of bots and trolling in the election, second maybe to voter fraud and voter suppression flame-fanning. Look for the rise of the #MeToo concern troll, the supposed “lifelong Democrat” who was “ready to vote for X” but these rumors “have disgusted” them. They’ll have names ending with eight random digits. Expect AI and microtargeting to play a role here, making it difficult for candidates to respond, especially to rumors they don’t know exist.

I don’t know what to do about this last problem. It’s knotty. I’m not the person to write about how we prepare as a society for this, but I wish someone would.

Larger Trends

Looking at these predictions, which I typed just now, stream-of-consciousness style, I’m struck by the two social vulnerabilities they exploit.

Industrialization of conversation. We have not come to terms with how the digitalization of conversation allows for its industrialization. And how it’s industrialization allows for manipulation that is more massive and immediate than what we’ve previously seen in the conversational space. We need to develop tools and norms to protect conversation from industrialization. And we desperately need to stop conceptualizing the discourse space on the web as a bunch of individual actors expressing an emergent will.

The erosion of the right to privacy. The modern expectation of privacy, if I recall correctly, was a result of both urbanization and literate culture. In a small tribe you don’t have much privacy, but on the other hand everyone knows you and has the context to evaluate new information about you. In a rural setting, most of what you do is relatively undiscoverable by others, unless someone involved talks. But in towns and cities people’s actions are both discoverable and shorn of context,  and written communication is similarly stripped of its setting and intended interlocutors, so new norms and laws were needed.

(Economics played a part here of course as well — consider the way that works of Rembrandt and others portray homes of the rising middle class as quiet and meticulously organized private spaces apart from the bustle of the street.)

Internet giants such as Google and Facebook have labelled privacy as a historical anomaly. And it’s true that the modern conception of privacy seems to emerge with the development of the modern middle class. But there are some things to note here.

The first is that privacy is necessitated by a move to literate culture. The nature of verbal communication is that it is by its very nature private, only available to those at a specific time and place. Written communication makes possible the broad dissemination of messages intended for different audiences and contexts, and so a notion of informational privacy has to develop. The mailman doesn’t get to read my mail, and has never had the right to do that, from the invention of mailmen. This is because the notion of mail gives rise to new notions of mail privacy. Mail doesn’t make privacy obsolete — the norms and the tech co-develop.

Looked at this way the move from literate to digital culture should not reduce the amount of privacy available to people, but increase the realms where the concept is applied. Our literate and productively bureaucratic culture could not have developed without the expansion of privacy norms around written communication. My Dad worked for Digital Equipment Corporation for many years, an IBM-like mid-20th century creation that functioned on memos and notes and written analysis. Had it been legal and acceptable for any person to go out and sell internal memos to other companies, or to publish employee assessments in the local paper, very little work could have been done on paper, and organizations like IBM and Digital simply would have been impossible to run. They would have collapsed. The invention of the written memo occasions new norms about the privacy of the written memo.

The move to digital communication should, likewise, prompt new and more restrictive norms around privacy.   Otherwise our digitally enabled culture will collapse, completely unworkable. But what’s different this time is the business model of our modern system involves the mailman reading our mail, so powerful interests have spent a lot of time arguing that rather than prompt new notions of privacy, technology undoes privacy. This is unbelievably stupid and technocentric. The invention of sexting, for example, doesn’t “undo” privacy — it argues for an expansion of the concept. The use of email doesn’t mean that everyone’s annual reviews will now be a matter of public record, or that we now all have a right to read the personal work squabbles of others — it means we need to develop new norms and laws and security expectations about email.

I’m sure that the powers that be in Silicon Valley believe in “the end of privacy”, just like they believe in technocratic meritocracy. The most attractive thing for any programmer to believe is that new technologies will render the messiness of social relations obsolete. But this idea, that privacy is antiquated, will lead to institutional and organizational collapse on a massive scale, which is why a transparency organization like Wikileaks is the favorite tool of dictators.

Additionally, unless privacy concerns are addressed, we will end up reversing the advances of the literate culture which allowed broad participation in discourse and decision-making. Keep in mind that while people become increasingly wary of speaking frankly in email, text, and chat rooms because of the lack of technical security and ephemerality people with face-to-face access to power will be able to speak freely. It’s easy to mock bureaucratic culture with its emails, and memos, and endless reply-to-alls. But when the only way to influence the direction of the company reverts to being seated in a chair across from the CEO we will miss it.

That’s a long point one on privacy. But let me add point two. The invention of an intensely internal and personal private life is one of the great gifts of modernity to humanity. I love Shakespeare, but read a soliloquy of Shakespeare’s next to Wordsworth’s Tintern Abbey. Reflecting on seeing a place he has not seen for a long time, Wordsworth writes:

 These beauteous forms,
Through a long absence, have not been to me
As is a landscape to a blind man’s eye:
But oft, in lonely rooms, and ‘mid the din
Of towns and cities, I have owed to them,
In hours of weariness, sensations sweet,
Felt in the blood, and felt along the heart;
And passing even into my purer mind
With tranquil restoration:—feelings too
Of unremembered pleasure: such, perhaps,
As have no slight or trivial influence
On that best portion of a good man’s life,
His little, nameless, unremembered, acts
Of kindness and of love.

People focus on Wordsworth’s treatment of nature, which is remarkable in itself. But the most striking thing to me about the poem has always been how recognizable the psychology is here compared to Shakespeare. And a piece of that is the way in which the narrator’s personal and private life provides sustenance even as the “din of towns and cities” drains him. The way in which even his mental experience of those “lonely rooms” is intensely and unapologetically personal. The obsession with the psychological reality of mental imagery. You see this same development with the novels of the Brontës and Jane Austen, and in portraiture. Matisse’s Woman Reading, for example, sits with her back to us, the room of hers small and unkempt, but she is transported via reading into a different world via the book she reads.

I’ve spent too many words here already on this post, so I’ll pursue this another time. It’s difficult to explain. But the notion of privacy is more than just social and organizational lubricant — over the past 500 years or so we’ve built it deep into the notion of what it means to be human, and removing it dehumanizes us.

Warmly,

Mike