Wikipedia deletions and #linkeddata implications
If you missed the update to my post on Tuesday about the deletion of First Aid Kit's Wikipedia page, I'm pleased to be able to point out that within a couple of hours of blogging about it, a Wikipedia administrator saw the post and went and restored the page. I now, of course, feel somewhat honour-bound to contribute and improve the quality of it. My post made an appearance in the Guardian Tech newsbucket, and also on the Y Combinator Hacker news feed, which sent a deluge of traffic which the site wasn't able to cope with very well - sorry about that.
The opening comment on the Y Combinator thread was well reasoned but critical of my piece:
"This guy is upset that Wikipedia deleted an article about a Swedish indie-rock band that appears to have been covered in depth by one alt-weekly in Vancouver and nobody else --- an article, incidentally, that was posted and deleted 3 times before that alt-weekly one-pager was published.
Wikipedia is not an effort to organize all the world's information. That's Google. Wikipedia is an effort to build the world's best encyclopedia. The difference between an 'encyclopedia' and 'all the world's information' is that the information in an encylopedia needs to be reliable. To ensure that the information in the encylopedia has a chance at being reliable, the encyclopedia is constrained to information that can be written about notable topics cover[ed] in reliable secondary sources.
Virtually everybody who writes an article about a non-notable topic ends up objecting, often loudly, when the article is deleted. That's understandable. Wikipedia could do a better job of warning people of the bar their topic needs to clear. But they can't make resurrection of deleted articles trivial to anybody, or they will spend all their time re-litigating deletions.
The likelihood that the particular 'speedy deletion' policy this article complains about will ever be resolved is epsilon. Speedy deletion, particularly of no-name bands, vanity books, websites, and tiny companies is almost the first line of defense against article-creep. Changing the policy would be an existential change to the way WP is managed.
Which doesn't matter, because you can resurrect speedy'd articles already; you just need to take the article to Deletion Review and make a case for it. Maybe WP needs an article on First Aid Kit. I like Fleet Foxes, too! (WP has excellent coverage of Fleet Foxes). But WP is run by human beings donating their time, and people make mistakes, and it is utterly disingenuous to pretend like First Aid Kit is an obvious 'keep'."
I just want to stress again that it wasn't my article, and although I like the band, I'm not some kind of super-fanboy.
What worried me about it - and still does - is that away from the Wikipedia community, we are building a whole linked data ecosystem which relies, in one way or another, on Wikipedia. It is no accident that dbpedia - a project to extract structured reliable data from the wiki - is one of the biggest and most central nodes on the #linkeddata diagram.
Sites are increasingly relying on constructions like a MusicBrainz ID pointing at a Wikipedia entry which in turn has a dbpedia equivalent full of lovely, lovely structured data about the subject. An inclusive Wikipedia that accepts that it has no paper limit and therefore can be utterly comprehensive (if harder to maintain) is central to the premise.
The risk is that if Wikipedians who believe it should have a much more limited scope gain the upper hand. Inclusion rules for musical artists might begin to involve chart appearances, or sales thresholds, or any other number of limiting factors, which will be fine if you only want to build #linkeddata sites that revolve around The Beatles and Michael Jackson, but mean you are going to struggle to use the data to go beyond the mainstream.
I prefer my Wikipedia as a wide-ranging collaborative knowledge repository, not as an arbiter of taste or cultural impact.
They aren't always going to get it right.
As someone pointed out in the comments thread on Y Combinator, at one point the Lady GaGa entry was deleted.