Encourage Students to Give Feedback on Google Answers

Early on in the Digital Polarization Project we had students do large info-environmentalism projects, and we still do that sometimes. But I’ve become convinced that the more sustainable change is in simpler actions — stuff you can do in a few hours or even a few seconds.

Here’s an example from today — I was talking with my daughter about her summer job, and the question of what minimum wage was in Washington State came up. Google told us it was $11/hour:

minimum wage

And it was — in 2017. But it’s 2018, and so scanning a bit further we found the Washington government site, where it is listed as $11.50 as of this past January:

history

That could be that, but there is a feedback button below the Google result.

feedback

Now I do want to be clear — the chances of Google looking at your particular answer and fixing this particular answer is quite low. Google tends to avoid adjusting individual answers. But if it identifies a systemic problem — for example, broadly out of date minimum wage data across many states — it may result in some attention to the way they derive these answers. So go ahead and do it, it just takes a second:

minumum

Having students give feedback also helps remind students that the answers they see at the top of the Google result page are at best guesses.

If you’re looking for a quick classroom activity, going and look up the minimum wage in various states and see if Google gets it right. Give feedback if it doesn’t.

 

 

Establishing the Significant History of a Newspaper on Wikipedia

Ultimately one of the prime goals of the Local Historical Newspapers on Wikipedia project (#LHNOW) is to make sure that significant local publications have an infocard, and thereby are more likely to generate a Google panel in the search results.

But that’s not the first, or hardest step.

The first, and more difficult, step is to establish the significant history of the given newspaper so that the article meets notability requirements and will not be deleted. Once that is accomplished it is relatively simple to go back and add infocards.

So how do we do that? And what does it look like? It varies from paper to paper — but here are some resources you can use and examples you can mimic.

Start With the Library of Congress Record

Here’s the main thing to understand about Wikipedia – a primary source cannot establish its own notability. So that long history in the paper’s about section? You can cite that, but only after you’ve established much of the history and significance through other sources.

The Library of Congress has a project about historical newspapers called Chronicling America and one of the nice results of that is that they have bibliographic records on many papers. This is a good starting place because it not only provides you a nice authoritative Library of Congress cite for your newspaper page, but it also alerts you to different names the paper published under, who the owners were, and whether it was preceded by a related publication.

It’s worth taking note of all these names and people, as they are going to be search terms for you. You can also start to build the chronology of the paper. If the paper is the result of a merger, you may want to cover the history of the previous papers it grew from in the article.

You can do a fancy LOC search using “site:” syntax or use their own internal search (which I found a bit lacking). But for most papers, this sort of thing gets you where you want to go.

loc search.PNG

When you get to the LOC page, note the first date (or year) of publication, the frequency, the publisher, and any preceding titles as you’ll work this all into your article.

Because different papers sometimes have similar names you’ll also want to check the town of publication and the publication years. Occassionally LOC will have multiple records for the same paper name in the same town and you have to find the right one.

Search Google Books

A useful way to establish notability is to search Google Books. For instance, through Google Books I learned the Wellesley Townsman published one of Sylvia Plath’s early poems, as well as an obituary that blamed her death on viral pneumonia. That’s interesting, and also adds to notability. I also learned that the Griffin Daily News played a significant role in stoking racial resentment — locally and nationally — in the 1890s.

What you’re looking for in these accounts is not a book sourcing a fact to these papers, but the papers either playing a role in events or being covered due to their importance. So if a book just cites the paper as a reference — well, that’s not really notable. But if it talks about the paper directly — maybe about the sale of it, or how it was the only paper to support a certain candidate for Governor, or when it went to a daily publication schedule  — that’s something to throw in the article. You might also see if notable people may have worked for the paper at one time and go to their articles and link them to the paper.

Google Books also does auto-citing pretty well — throw the link into the cite box and it builds the citation for you. Don’t pull the URL from the location bar, however, pull it from the link up top after hitting “clear search” — this should provide a link directly to the cited page.

plath

Some older Wikipedians get a bit grumpy about autocites — they don’t look as nice, and when multiple cites are used they don’t compress into nice “Ibid’s” etc. I’m sympathetic, but it’s not something you should worry much about. Using autocite maximizes your research time and provides direct links to evidence, so on the whole it’s a good thing.

Historical Newspaper Archives Will Save Your Life

The most useful resource for finding out the history of a paper is other contemporary papers. Start by checking if your university’s library has subscriptions to newspaper archive search engines.

Nineteenth Century Newspapers:

19th.PNG

ProQuest Newspapers:

pro

And Nexis Uni:

nex

 

If you don’t have the access you need from your institution or local library you might want to pay for a personal account somewhere. The “Publisher Extra” subscription level of Newspapers.com costs $75 for six months and a NewspaperArchive account is $50 for six months. Both are excellent sources, especially for small local papers.

Even for these accounts, you may not have to pay any money at all — Wikipedia provides a number of free accounts of NewspaperArchive to Wikipedians that have a significant edit history and no institutional access, as do some local libraries.

The amount of hidden history that you can find news archives is extensive. Here’s my recent Newspapers.com clippings on J. J. Benford, editor and initial publisher of the Albertville Herald in GA:

In there we have the entire early biography of this editor. We’ve also got various articles on the merger of the Albertville Herald with the Sand Mountain Reporter.

Clippings are also shareable with the general public which makes them very useful on Wikipedia.

Here’s a shot of NewspaperArchive with an article on Jesse Culp, the editor of the Sand Mountain Reporter in 1961 when the article was published:

culp.PNG

Now it’s on Culp speaking to the PTA, but we learn that he had been editor of the Sand Mountain Reporter since it spun up in 1955, and that — like J.J. Benford (who ran the other town’s paper) — his background was in agricultural radio reporting. We also get a nice connection (and therefore link out) to the WAVU Wikipedia article.

And here’s a Nexis Uni page on a purchase of the paper in 1999:

nexis

Pulling It Together

When you write your article, these bits of research are used for small parentheticals, but they get cited as well. For instance, here is a page for the Sand Mountain Reporter I drafted this morning out of these references:

smr

The links should go to “clippings”, not pages, per both Wikipedia and the archives. Clippings in these systems are a way to share specific articles publicly, and linking to the clipping — which is not behind a paywall — allows others to check your work and the accuracy of your citation without needing an account.

In this case we weren’t able to find anything worthwhile in Google Books about this paper, but by getting down the history — even of this rather small paper — we’re able to show its long and important history in the community. And we’re able to do this without citing the paper itself, instead relying on the Library of Congress and four other local papers to tell the story here.

It would be nice if we had a Google Books story or two — a Sylvia Plath style story, or even a mention in a book on local Alabama history. But I think you can make the argument to those that ask that the long and continued coverage of this paper by other papers shows its importance to the region. While the paper may seem a little slight, it is not simply a weekly shopper of pay-to-play features, but a true area newspaper with a significant history.

After you’ve established notability, you can go ahead and write up an infobox on the page (or let someone else do it) giving the most current stats of the paper, and think about what needs to be in those important first couple sentences of the page. Or fix citations — I notice I didn’t note the page of the paper here, which I should do. But  starting with the history and significance will get you off on the right foot.

 

 

 

 

 

 

 

Announcing the Newspapers On Wikipedia Project (#NOW)

TL;DR: I am announcing a project to get students and faculty to produce 1,000 new Wikipedia articles on significant English-language local newspapers by October 12, 2018. This will represent a substantial increase in Wikipedia coverage of these papers (An increase of 1,000 U.S. papers would be almost a 40% increase in U.S. coverage, for example). Join by doing it and telling me.

Background

I’ve just read a stunningly good paper from Emma Lurie and Eni Mustafaraj. The paper is chock full of all sorts of insights for both the media literacy teacher (which of course I am) and the search UI/UX designer (which I was) and it feels like it was written just for me, to help me get better at what I do.

The core of the paper is this — Lurie and Mustafaraj nudged students with prompts into using lateral reading on sources, and then watched how they performed. In doing so, they were able to identify the ways in which untutored lateral reading succeeds and how it fails. This close examination yields a variety of insights in what search platforms, media literacy teachers, researchers, and others can do to better support readers in this process, as well as noting some pitfalls of current online literacy advice (including some of mine).

More on the whole paper later: here’s the part that matters right now.  One of the things that hinders students is the lack of decent Wikipedia documentation of local news sources. This, in turn, effects the information that comes up when students do lateral reading on a source, particularly in the Google panels, which readers notice but often find missing or less than helpful on smaller sources.

The researchers even quantify the issue: the USNPL lists 7,269 news sources in the U.S. Only 2,702 of those produce “knowledge panels” in Google, with the likely reason for lack of a panel being lack of a well developed Wikipedia page. Even aside from the knowledge panel problem, the lack of decent pages for local news means that students will not always be able to find any objective information, even on a deeper search.

What struck me though was that this is a solvable problem. And it’s one our students can help solve.

Students Can Learn About News While Learning About Wikipedia

Many faculty want to have their students work in Wikipedia to better understand how Wikipedia works, and to provide their students with authentic digital research projects. But finding articles that their students can add and work on is not always easy — notability requirements often lead to the deletion of student created pages.

But new newspaper and radio articles, provided they are on entities with a significant history, have a bit of an advantage in Wikipedia. Here are the notability guidelines for these sorts of articles:

Notability is presumed for newspapers, magazines and journals that verifiably meet through reliable sources, one or more of the following criteria:

  • have produced award winning work
  • have served some sort of historic purpose or have a significant history
  • are considered by reliable sources to be authoritative in their subject area
  • are frequently cited by other reliable sources
  • are significant publications in ethnic and other non-trivial niche markets

Publications that primarily carry advertising, and only have trivial content, may have relevant details merged to an article on their publisher (if notable).

These guidelines prevent you from adding your neighborhood shopper to Wikipedia, or advertising your new blog of local news. But that’s not the gap we’re looking to fill. Most papers on the USNPL that do not have Wikipedia pages have existed for decades or centuries; many are logged in the Library of Congress as significant historical publications. If students learn how to demonstrate that significant and extended history of these publications through the use of secondary sources, they should be able to easily meet notability guidelines for the papers we care to log.

In the process students will learn a number of things that will help them better evaluate news. They’ll understand the nature of local reporting, the history of it, and its importance even in an increasingly digital world. They’ll learn the names of various journalism awards and their reputation, to better help them to evaluate quality.

And they’ll also see some ugliness as well, and understand the was that local media has been used for ill. In a brief spate of edits over the weekend the random papers I pulled included one that advocated and joked about lynching at the turn of the century, and one that acted as the unofficial mouthpiece of the early Ku Klux Klan. I also noted a number of newspapers that served persons of color that were not represented in Wikipedia.

And of course, follow any paper’s history into the 1990s and early 2000s and you’ll find the same story again and again: local papers being bought out by often distant corporations with no connections to the community.

Why Historical? Why Newspapers?

To be quite honest, this is strategic on our part. Wikipedian deletionists worry — quite rightly — that small and trivial local publications may use Wikipedia as an advertising space, both to drop their promotional copy into and to juice their Google results.  Because papers with no significant history must demonstrate notability in other ways the battle about notability can become quite contentious.

We’re choosing an easier task to start — documenting existing newspapers with a significant history. We aim with this first project to pick papers 25 years old or older that have been noted repeatedly in other media due to their historical or community significance. We put “Historical” in the title to clearly signal to Wikipedia admins and others our good intentions and our prime argument for notability.

Why October 12?

It’s a safeguard. If we are not at our goal by October 10, I plan to go to the Open Education conference in New York and shame everyone into helping us cross the finish line.

You Join By Doing It and Telling Me

Here’s what you do.

Set Up Your Profile Page

First, if you don’t have a Wikipedia account get one.

Second, make a profile page on Wikipedia for yourself. Write a few paragraphs about yourself. You can be pseudonymous if you want, or let it all out there.

Add some text like this somewhere on the page (edit it to reflect you) and link to my profile page so I know about you:

Newspapers On Wikipedia Project (#NOW)

One of my current interests is improving the coverage of historic local newspapers in Wikipedia. One of the best ways for readers to sort out whether a newspaper is real or fake is to check Wikipedia to see if it has an article (and what that article says). Having newspapers documented is also crucial to Wikipedia internally, since many historical claims are sourced to local papers, and editors require context on the nature of the publication the material appears in.

Yet only 38% of local papers have a Wikipedia page. The problem is particularly bad with weekly papers in small towns, even though many of these papers have publication histories going back to the 1800s. I am participating in a project initiated by Michaelacaulfield in this area to improve the quality and reliability of local news sources, particularly historic newspapers.

Again, link to my userpage as above so I can find people doing this. Eventually we’ll get a WikiProject page set up but this will work for now. Tweet any edited or created pages to Twitter’s #NOW hash.

Putting this information on your page will help people reviewing your edits and creations to understand what you are trying to achieve.

Gnome Before You Create

Don’t jump right into creating pages. Spend a week or two visiting existing pages on local newspapers, seeing how they are set up, and making minor improvements in language, citation, or formatting.

This is called gnoming, and a history of gnoming demonstrates you are interested not in self-glorification or grinding a specific axe, but making Wikipedia better. WikiGnome actions are listed here, and doing them will make you a better writer and editor of articles. You can find some articles to gnome here.

More importantly, however, a person who has made useful edits and additions to a wide variety of other people’s pages builds social credit, and is less likely to have their pages deleted without a conversation first. You build the credit gnoming, you spend it creating new pages.

After gnoming, you might want to expand further out into making significant additions to pages.

Don’t Create Your New Page Until You Know How to Establish Notability

Once you’ve gnomed for a week or two, you’re ready to create a new page. You can go to the USNPL list or another list of local newspapers and start looking for ones that aren’t covered. But you should be careful before creating them, since it’s important to get new pages right on the first go.

If you’re a seasoned Wikipedian, you know how it works:

  • Secondary sources (multiple if possible) to demonstrate notability.
  • Use Google Books to find strong supporting links.
  • Reference information on collection records at the library of Congress.
  • If you have a Newspapers.com account, search for coverage of your paper by other papers.
  • See if the paper is linked to any otherwise notable figures.
  • Try to get down the complete history of ownership — to the extent the history of changes in ownership were noted publicly by significant secondary sources it establishes notability, and it will also serve to get a broad array of citations in the article from the get-go.

Do all this before you launch, because in my opinion people have it easier if they get it right from the start.

If you’re not a seasoned Wikipedian, don’t fear. I’ll do up a video guide on how to write a notable stub on a local historical newspaper soon.

More soon.

Google’s Big AI Advance Is… Script Theory?

Like many people I watched Google’s demo of their new Android system AI calling up a hair stylist and making an appointment with trepidation — was this ethical, to not disclose that it was an AI?

But now that the smoke has cleared, I’m realizing something a bit more disturbing. After years of Big Data  and personal analytics hype, the advance that Google demonstrated is an application of 1970s AI work that requires none of that.

Setting up a haircut appointment is a social script. It has a sequence of things that happen, usually in a predictable order. The discovery of the importance of social scripts in computational understanding of communication was a big part of what Schank and Abelson brought to the field of AI in the 1970s.

Scripts were important both in terms of computers navigating standard social situations, but also in understanding stories about those situations. When I studied linguistics, one of my favorite little facts was you could often discover socially legible scripts by noticing how stories were elided. For instance, if I say “So I go to a restaurant, and the server gives me the bill…” no one stops me and says “Wait, you got a bill before you ate anything? And who is this server person?” The understanding in storytelling is I can evoke a script and then start at the part of the story that deviates from the script. That’s how core they are to our thinking and discourse, and Schank and Abelson made the case in the 1970s that mapping out these scripts would be core to computer understanding as well.

While less physical than dining, booking a haircut over the phone is a script too. It follows a particular sequence and has slots where the unique bits go.  In general we find out if I am in need of a particular stylist, and then drill down on a date and time. Importantly, it works because I’ve learned the script and I know the things the hair stylist will ask and I have the answers the stylist requires. I know I need to provide date, time, and stylist, and I might need to supply a rough time of day preference — mornings, afternoons, end of day, before work. On the other hand, I know the stylist is not going to ask me if I’d rather have a chair nearer to the window or the bathroom or what type of music I prefer in the salon.

Here’s the thing: The precise nature of social scripts is that they often allow people with no knowledge of one another to negotiate transactions successfully. Preferences figure into that but are usually easily enumerated by each party — because that’s part of the script.

Because of this, I don’t really need personal analytics to discover that I like my cappuccinos extra dry. I have years of experience walking through scripts where I’ve learned to specify that, and the script has a very specific spot where that goes. The script has taught me how to concisely enumerate my preferences in ways useful to baristas.

In fact, analytics in these situations end up being a lesser reflection of the explicit inputs into the script. For example, Google might search my flight booking data and find I like window seats towards the front, that I prefer Alaska and layovers with a bit of buffer in them. But the patterns I produce in what I get for flights aren’t a mysterious secret sauce discovered by analytics, they’re the product of me specifically asking for nine things when I book flights. Nine things I can easily rattle off, because I’ve been doing the “booking a flight” script for years.

So here’s the question about the “haircut” demo: if the nature of the social script is you *don’t* need deep knowledge or background for the script to work, then what’s all the talk about personal data being Google’s prime AI asset about? What’s all the machine-learning hype?

After years of sucking up all our data Google’s big AI advance is… Script Theory. Which requires none of this. Maybe we should be talking about that.

Taking Bearings on The Star

One thing people may not realize is I use the exact same techniques we teach to students in my daily work. The skills we are giving students aren’t some dumbed-down protocol. They are great habits for reporters, researchers, and other professionals as well.

As an example, this article came up in my news alerts this morning.

malay

I’m interested in fake news in Southeast Asia, so I’m glad to read analysis and opinion from a place like Malaysia, but I want to source-check, even if I think I know this source. So we strip off everything from that URL and add Wikipedia.

wwwww

 

This pulls up a relevant Wikipedia page:

stars

 

And clicking through we are reminded that The Star is effectively owned by the Malaysian government.

sdhf

And then we’re back to the article after a 30 second detour.

For the record, I still read the column, but I didn’t share it, and if had had shared it I would have noted that it was a legitimate news source to some extent, but possibly compromised by its ownership. Sam Wineburg has talked about this process as taking bearings, and I like that term a lot. Before trudging blindly into an article, pull out the compass and the map and figure out where you landed. It’s so simple to do, there’s really no excuse for not doing it.

(I should note that I’ve elided a number of things I do know about Malaysia and government propaganda there for the sake of clarity in this post — but the truth is if I have any doubt about the source at all I use the process, just the same as a novice. I had a vague memory about this precise ownership issue, but the process is always likely to give me a better result than my unaided memory. And it’s actually less cognitively demanding as well.)

(EDIT: changed “heavily compromised” to “possibly compromised” since the initial wording expressed more certainty than I had wished to portray. Legitimate news organizations with ownership issues are often fine on many issues, whether a particular news item might be influenced is contextual.)

The “Just Add Wikipedia In the Omnibar” Trick

One thing we do in the Digital Polarization Initiative is to hone the actions we encourage students to take down to their most efficient form. Efficient meaning:

  • easy to memorize
  • quick to execute
  • with a high likelihood of providing a direct answer to the question you have

Our student fact-checkers rely heavily on Wikipedia, and usually the best first pass at getting a read on a site is to read the Wikipedia article on it. But what’s the fastest way to get the relevant article?

As an example, consider the organization Nuclear Matters which describes itself this way:

nuclear.PNG

Nuclear Matters is a national coalition with a diverse roster of allies and members. Our Advocacy Council is made up of leaders from various areas, including labor organizations, environmental supporters, young professionals and women in the nuclear industry, venture capitalists, innovators in advanced nuclear technology and former policymakers and regulators.

This site is not quite claiming to be grass roots, but we notice the one word not here is “industry-funded”. And we’re curious — you have some varied members, but where does the money come from?

As mentioned, the best first stop on this is Wikipedia. I used to show students how to do the site search for Wikipedia using the “site:wikipedia.org” syntax — but I found even faculty I taught this to were forgetting the syntax — or searching for “wikipedia.com” which gives weird search results.

So I now just do this omnibar hack, using the URL to match against Wikipedia pages:

It works for a couple reasons I can discuss at a later time — but it’s a useful enough  habit I wanted to share it in a post.


BTW — In case people coming here don’t know, I currently run a national, cross-institutional project that aims to radically rethink how we teach college students online information literacy, where we teach them tricks and techniques like this. Ask me about it — my DMs are open. Or read the textbook: Web Literacy for Student Fact-Checkers and apply it to your own class — it’s free!

 

OLC Innovate Privacy Concerns

Today, OLC Innovate leadership requested feedback from attendees on the issues of data collection and privacy raised by (among other things) the attendee tracking badges and session check-in procedure. I replied in email but am republishing it here, lightly edited:

I’m really glad to see you considering privacy issues, and mostly wanted to just thank you for that. I think OLC could lead the way here.

I felt the badges and the system of checking people into rooms was invasive and took away from the otherwise friendly feel of the conference. I don’t know if I want vendors, or OLC, or my boss knowing which events I attended and which I didn’t – and I certainly don’t want that data on a bunch of USB-based unsecured devices. What we have learned from the past decade is that you can’t really know how data will be misused in the future, and consent on data isn’t really meaningful because when data gets combined with other data it becomes toxic in ways even engineers can’t predict.

It seems to me that you have a few small pressing questions that you could answer without the tech. What sessions do people attend? Are there subgroups of attendees (managers, faculty, librarians) which seem to have less desirable session options?

Even if you still want to use the tech, if you scoped out the specific questions you wanted to answer you could do much better. You could not only capture that info in a less potentially toxic way, but you’d be more likely to use it in useful and directed ways. As just one example, if you replaced unique ids on the badges with a few basic subtypes – one code for managers, one for faculty, etc. – you would not be collecting personally identifiable information about people, but you would meet your goals. If you centralized the collection of information by job type you could also provide that information to speakers at the beginning of their session in ways that would be far more useful and safe than any undirected analytics analysis.

In short, do what we tell faculty to do in assessment of instruction:

  • Think about a *small* set of questions you want to answer
  • Collect only the information you need to answer those questions
  • Answer those questions by creating aggregate findings
  • Delete the raw data as soon as retention policy allows

You think you want to answer a lot of questions you don’t know yet by rooting in the data. Most of what we know about analysis tells us you’re far better off deciding what questions are important to you before you collect the data. I would go so far to share with attendees the five and no more than five questions that you are looking at answering each year with the data you collect, and explaining all data and its relation to those questions. After you answer a question a couple years in a row, swap in a new one.

 

(I’ll add that for all these issues there needs to be a meaningful opt-in/out. I would suggest that the de-individualized code be a removable part of the badge).