Wikipedia: re-writing history
By Andreas Kolbe
Originally published at http://wikipediocracy.com/2014/10/12/wikipedia-re-writing-history/
For more than six years, Wikipedia named an innocent man, Joe Streater, as a key culprit in the
1978–79 Boston College basketball point shaving scandal. Thanks to the detective work of Ben
Koo at sports blog Awful Announcing, the world now knows (again!) that Joe Streater had no
involvement in the affair. He couldn’t have, because he didn’t even play for the team in the 1978–79
season.
Entering the Wikipedia wormhole
In his article, Guilt by Wikipedia: How Joe Streater Became Falsely Attached To The Boston College
Point Shaving Scandal, Ben Koo describes how he fell “down this wormhole” that ended at an
anonymous Wikipedia edit made over six years ago.
It began like this: Koo had reviewed a 30 for 30 documentary on the Boston College point shaving
scandal for Awful Announcing. In this review, he remarked on the curious fact that one of the four
players eventually tied to the scandal wasn’t mentioned in the film at all.
This prompted a puzzled email inquiry from a former Boston College player who’d been involved in
the affair: Which player did Koo mean? Koo replied that he had found it curious that Joe Streater
hadn’t been mentioned in the documentary, given that all the articles he had read as part of his
2
background research had named Streater as one of the sportsmen involved. The reply he got from
the former Boston player astonished him:
“Joe Streater wasn’t even on the team that infamous year as he had left school the year before.”
At first, Koo was incredulous. How could this be? Streater was mentioned in Wikipedia and so many
other articles on the web. But the player’s personal testimony could not be discounted: he’d been
there. So Koo decided to investigate. He checked the Boston College Men’s Basketball Guide. Sure
enough, Streater was only listed as a player in the 1977–78 season. The 1981 Sports
Illustrated article that first broke the story did not mention Streater. Contemporaneous news
clippings confirmed: Streater took part in only 11 games in the 1977–78 season, and after that never
played for the team again. And finally it dawned on Koo: the reason Streater was mentioned in
Wikipedia and in every other article he had read was – because it was in Wikipedia.
Koo tried to locate Streater; his searches were unsuccessful. But he established that Streater’s name
had been inserted into the Wikipedia article on the scandal in August 2008, by an anonymous user
using a mail.goodwillmass.org IP address. Koo satisfied himself that none of the books and press
articles published on the incident before August 2008 ever mentioned Streater’s name. Yet since
then, Streater had become widely associated with the scandal through newspaper and TV reports as
well as countless blogs and fan sites. Even an Associated Press article, carried by Yahoo! for
example, to this day mentions Streater as one of the culprits, among many other publications listed
in Koo’s article.
Citogenesis
Spurious Wikipedia facts entering other sources has grown so common that the process has become
immortalized in a famous xkcd cartoon that coined the word “citogenesis” to describe it. People may
think it’s a joke. It isn’t.
3
A recent blog post on Wikipediocracy covered a multitude of documented cases, from a Wikipedia
article on a wholly fictitious war that won a Wikipedia quality award (and retained it for five years)
to the invention of a new name for the coati: the “Brazilian aardvark”, memorably debunked in The
New Yorker.
A week after our blog post appeared, E. J. Dickson at The Daily Dot reported on the Amelia Bedelia
hoax – a piece of wholly spurious information she herself had added to Wikipedia five years ago, as a
stoned sophomore, only to find it quoted on Twitter this summer by Jay Caspian Kang. Kang,
ironically, is the science and technology editor at The New Yorker, demonstrating that not even the
journalists in charge of the publication with the best reputation for fact checking in the world are
immune to the Wikipedia bug.
Wikipedia insiders have long been aware of the “circular referencing” problem. The site has for years
now had a dedicated policy section for such cases, WP:CIRCULAR. But when a “fact” has the stamp
of approval of an authority like the Associated Press, who would doubt its veracity?
So fix it!
Wikipedians are usually quite sanguine about any errors found in Wikipedia, based on the true fact
that once an error is identified, it can be corrected instantly, converting dismay into satisfaction.
Wikipedia has been improved!
It’s an article of faith with Wikipedians that Wikipedia is “always improving”. But Koo noted that the
article on the Boston College point shaving scandal had actually deteriorated in some ways over the
years:
[…] one thing that sticks out is the original Wikipedia page back in 2007 listed several great sources
including Porter’s book, court documents, and coverage from the Globe […]. Since 2008, the
Wikipedia article has somewhat regressed in terms of sourcing and has this message at the very top
now:
4
“This article does not cite any references or sources. Please help improve this article by adding
citations to reliable sources. Unsourced material may be challenged and removed. (September
2008)”
Koo went on to make the salient observation that –
Streater’s name stayed attached to the scandal for over six years and would have likely persisted if
not for the documentary’s airing and perhaps some of my outreach on the matter.
The standard Wikipedian response greeting the critic who points out an inaccuracy on Wikipedia is,
“So fix it!” But the fact that an error can be easily fixed should not excuse a reference site from
hosting it in the first place, for more than half a decade. After all, it is just as easy for an anonymous
user to insert an error as it is to fix one!
This particular instance of libel took over six years to discover (in what was surely no coincidence, it
was corrected by an anonymous IP editor the day before Koo’s article appeared). That it has been
remedied now is little consolation to all the readers who read and believed it during the past six
years, and to all the journalists who propagated it, compromising their reporting. And like other
spurious facts spawned on Wikipedia, it is bound to live on on the internet for years to come.
Who tracks changes to Wikipedia?
Wikipedia’s volunteer contributors keep track of articles they have an interest in by means of a
“watchlist” that alerts them to any recent changes to these entries. It’s a little-known fact that
among the English Wikipedia’s 4.7 million articles, there are hundreds of thousands that no editor
has on their watchlist. These articles, which are sitting ducks for subtle vandalism as well as the
insertion of well-meaning, but erroneous content, are in a special category,
Special:UnwatchedPages.
5
This category is inaccessible to ordinary users, and mostly inaccessible even to the site’s
administrators, because it is truncated (possibly for performance reasons). Administrators can only
see the first 3,000 entries in this alphanumerical list. The 3,000th entry is an article beginning with
the characters “2000”.
In other words, the 3,000 entries that are visible to administrators don’t even reach the letter A. In
fact, they don’t even reach the number 3, making the list pretty well useless for quality control
purposes.
Based on extrapolation from other large article categories, the total number of articles nobody is
watching is probably well over half a million. This August, an administrator suggested,
[...] the number is doubtlessly somewhere between 100,000 & one million.
These are articles that are on nobody’s watchlist. At all. But this is not the end of it, because in
practice, users tend to have so many articles on their watchlists that they ignore most of their
notifications. Other users may not log in for weeks; even the watchlists of retired users who have
not checked them for months or years still exist, waiting for their owners to return. Bearing this in
mind, the number of effectively unwatched articles is likely to be far greater still.
This is partly a consequence of the fact that while the number of Wikipedia articles continues to
grow, the number of active Wikipedia contributors continues to drop. In May 2007, when the point
shaving article was created, the English-language Wikipedia contained 1.7 million articles and had
4,736 “very active” contributors (defined as contributors making more than 100 article edits a
month). By August 2014, the number of articles had risen to 4.7 million, while the number of “very
active” contributors had dropped to 3,130. While there were 2.8 very active editors per 1,000
articles in 2007, there are less than 0.7 now – the ratio has dropped to less than a quarter of what it
was.
6
There is one other safety net against subtle vandalism: Wikipedia’s recent changes display, showing
all edits made to Wikipedia as they occur. But recent changes patrollers often review hundreds of
edits an hour. They do not have the time to check sources, and generally catch only edits that are
very obviously problematic, even to someone who has no familiarity with an article’s subject
matter. Many edits are never looked at, simply because they’re coming in so thick and fast.
The German-language Wikipedia and a few others have a system, known as “Pending Changes” or
“Flagged Revisions”, whereby any edit from an IP address has to be checked by a more experienced
contributor before it is accepted and displayed to the public. This system might well have stopped
the unsourced addition of Streater’s name, for example. But the English Wikipedia chose not to
implement this system, fearing that it might create bottlenecks and reduce participation. It traded
reliability for quantity.
As accurate as Britannica?
Even in the face of mounting evidence to the contrary, tech writers are still fond of citing a 2005
“study” by Nature that found “Wikipedia (almost) as accurate as Britannica”. No one seems to
remember these days that the Nature piece was not a rigorous peer-reviewed study, but a
journalistic piece that only looked at a small sample of articles on science topics – including some
fairly obscure ones like the “kinetic isotope effect” or “Meliaceae” that might just elude the grasp of
the average vandal. And in some cases, Nature compared excerpts of Britannica articles to their
Wikipedia counterparts, and then counted “omissions” in Britannica as errors – even though the
7
“missing” facts were contained in article sections Nature had discarded. As Britannica pointed out in
their rebuttal:
One Nature reviewer was sent only the 350-word introduction to Encyclopædia Britannica’s 6,000word article on lipids. For Nature to have represented Britannica’s extensive coverage of the subject
with this short squib was absurd, and it invalidated the findings of omissions alleged by the reviewer,
since those matters were covered in sections of the article he or she never saw.
Nature rejected these complaints, many of which hinged on fine points of detail and emphasis.
And Encyclopædia Britannica did acknowledge there had been some errors on its pages. But while
Britannica may be imperfect, it is quite safe to say that it did not and does not contain false
information inserted by anonymous people for fun or for financial gain, that it does not
contain anonymous hatchet jobs written by people’s rivals, and that it is not full of puff-pieces
companies and individuals have written about themselves.
Quality problems? What quality problems!
One response that critics of Wikipedia often face when they point out errors in Wikipedia is this: that
nobody ever claimed that Wikipedia is perfect, and that people are regularly warned that Wikipedia
may contain errors. It has a disclaimer! Everyone knows that it is free, crowdsourced and curated by
volunteers who work on whatever it is they feel like working on.
Spokespersons for the Wikimedia Foundation (WMF), the non-profit organization operating the
Wikipedia website, rarely express dissatisfaction with Wikipedia’s lack of reliability. Its public
messages focus on the feel-good factor of its vision statement: “Imagine a world in which every
single human being can freely share in the sum of all knowledge. That’s our commitment.” Studies
pointing to reliability problems are commonly rubbished among Wikipedians, while those reporting
positively are accepted uncritically and highlighted.
8
Jimmy Wales for example recently tweeted, in response to a Twitter user making a dismissive
statement about Wikipedia’s reliability,
Actually academic studies show we are about as accurate as traditional encyclopedias and improving
all the time.
At the recent Wikimania conference, a slide triumphantly announced a survey finding that British
people now trust Wikipedia more than news organizations (the irony here being that Wikipedia
articles are commonly based on news articles). Wikipedians cheered.
It often feels like this: if you complain about reliability problems, you are told you have no right to
expect perfection, and moreover, you can fix them yourself; while in all other contexts, Wikipedia is
styled as the first real wonder of the digital world and one of the greatest advances in human
history. There is a distinct whiff of self-serving doublethink about this attitude.
Where does the Wikimedia Foundation stand on reliability?
Content reliability has not really been a discernible priority for the Wikimedia Foundation for a long
time. In fact, to this day, the WMF does not even measure content quality – their staff admit freely
that they have no idea how to do it. Instead, the Foundation measures and reports quantitative
metrics such as the number of volunteer editors, articles, article edits and page views. In that sense
it is no different from Facebook, except that it has another metric: the amount of donations flowing
into its coffers.
With content generation and quality control being left squarely in the hands of Wikipedia’s unpaid,
self-selecting and largely anonymous volunteers, the Wikimedia Foundation sees itself merely as a
“technology and grantmaking organization”. Its priorities currently are to expand its software
engineering staff and modernize the user interface of its sites, especially for mobile users, in order to
prevent readers from flocking to rival portals and providers like Wikiwand that offer the same free,
Creative Commons-licensed Wikipedia content in a more visually appealing setting. The Wikimedia
9
Foundation’s newly-hired Vice-President of Engineering, Damon Sicore, put it like this in his first IRC
Office Hour:
I see us having to scale to a size that enables us to compete with the engineering shops that are
trying to kill us. That means we need to double down on recruiting top talent, and steal the engineers
from the sources they use… because… well… they are REALLY GOOD.
While the Wikimedia Foundation’s fundraising banners say they need funds to keep Wikipedia
“online and ad-free”, I reckon that this is where most of the money raised in the fundraising drives
will be going.
Goodwill hunting
Ben Koo – whose article is a phenomenal read – pointed out that the IP address who added
Streater’s name to the Wikipedia article on the Boston College point shaving scandal had not made
any edit since 2009. That is so. But mail.goodwillmass.org has other IP addresses, too. One of them
is 209.6.3.182. On 3 July 2013, it changed the Massachusetts Lottery win of Whitey Bulger, a Boston
crime figure who once ranked next to Osama bin Laden on the FBI’s most-wanted list, from $14
million to $69 million, contradicting the cited source.
That change didn’t last six years. This time, it was just three weeks.
Image credits: CC BY-SA HonestReporting.com, flickr/opensourceway, Flickr/Stewart