What difference does it make? Adding Wikipedia style 'diffs' to BBC News
I was giving some thought to the frequent requests that the BBC's news site make its editing more transparent and accountable by retaining each version of an article on the site, a 'diff' if you like, in the style of Wikipedia.
On the face of it, it seems like a simple feature to implement, with some obvious advantages, but I've also been thinking about some of the practical problems it might present. As always seems to be the case with web technology, in solving one user experience problem, you often unintentionally create a couple of others.
Looking at it as a feature request, it seems straightforward enough:
Store a version of every published state of every page of the BBC News site, and link to it from the current version.
However, not all content on the site is created equal, and not all edits fall into the same category.
What about when a story isn't edited, but a new pull-quote or stats box is added to the flow of the page. Would that require a new published 'diff' there? There seems to be a good case for saying you should.
What about the minute-by-minute coverage of a day's play at a Test Match or live text for a football game? The content is generated in the same CMS, uses the same templates as a news story, and appears on the same site. So, would you publish and link to the 150+ diffs of that page? I'd argue that isn't particularly helpful to the user, or necessary.
So, what about the stories which ask for user's to submit comments and eyewitness accounts, and then progressively add these at the foot of the page? Again, I'd think it was reasonable to publish a 'diff' in this instance, as it highlights the way the reporting and understanding of a story has developed.
But hang on a second - if you are doing that, should you make a diff for every time a new comment is published on a 'Have Your Say' or '606' thread? In which case, for those pages, you would end up with potentially thousands of 'diffs'. What use would that actually be?
What about the News blogs? Blogs tend, by their nature, to be thrown together more rapidly than considered reporting, and so are more prone to typos.
In order to make the system credible, the BBC would need to record every single edit on every single page, with a timestamp, for whatever the reasons. On some pages and some types of content though, this would present a usability issue, and serve little purpose. However, if the BBC were to start picking and choosing which stories got the 'diff' treatment, they would leave themselves open to accusations, and the temptation, to use the technology selectively. And who measures the 'diffs' of the 'diffs'?
The system would also pose some problems for search. If, for example, the phrase 'neutron flow polarity' appeared in the first verion of an article, and then not in any subsequent version of a story, would you expect that version of the page to appear in the results of a search for 'neutron flow polarity'?
The answer is, I think, almost certainly yes, you would want that result to come back.
Now, suppose the phrase was rather more common on the site, for example a search for "Christopher Eccleston". His name would appear not just in a lot of stories on the site, but also, potentially, in lots of 'diffs' of stories on the site.
You would need to do some quite smart engineering work to ensure that if a phrase was unique to a diff, then that diff appeared in search results, but that if the phrase appeared in various versions of a story, only the latest version was shown. You'd have to do this to avoid huge amounts of duplication in the results. Not impossible, but not a functionality that most search engine technologies have to worry about.
That's not to say that I think it is necessarily a bad idea to archive the different versions of stories as they are published. I think there would be some obvious gains for the BBC, and not just in terms of increasing the transparency around what is published.
It would be a fantastic resource for journalism students, for example, after events like the July 2005 bombings in London, to be able to flick back through and study how the reporting of the news developed on the day. At the moment the day has to be pieced back together from the final 'In depth' coverage, Flickr collections put together as it happened, and the odd screenshot occasionally used around the BBC site.
That, though, is an exceptional example. I remember one of the technical architects at the BBC putting together a quick backstage.bbc.co.uk prototype that checked every single edit on the BBC News site, and produced a stream of output that was incredibly dull, as it mostly consisted of fixing typographical errors.
With a 'diff' system in place, you might, as a side effect, also see an improvement in the quality of journalistic output. The rush to press publish may be slightly tempered by the knowledge that the first live version of an article is going to stay on the site, rather than vanish completely in ten minutes time when the writer has seen it on the live site and spotted a couple of extra typos.
However, for me it essentially boils down to this. The solution is to present lots of versions of a story to show how it was edited over time - but what was the problem you were trying to solve? I think, fundamentally, the issue here is one of trust with the BBC as a news source, not an issue of problems with the content management system it employs.
Put simply, the large number of people who still say in surveys that they trust the BBC's reporting are not clamouring for a system that allows them to view slightly out-of-date versions of the news. They work on the assumption that the version they are reading is the most accurate and up-to-date information that the BBC has.
The desire for such a system comes from people who don't think that is the case, and the question would have to be if people don't trust the BBC's reporting, would they trust any 'diff' system the BBC put in place?
A classic argument with BBC reporting is with the inclusion of responses to a developing story by the differing political parties in the UK.
A 'diff' system would probably reveal on a consistent basis that a story breaks on the site about a political initiative from the Government. A second version appears a little later which includes a quote from an Opposition spokesperson. A third version then appears with something from the LibDems and perhaps also something from a lobby group / minor party / celebrity / other interested party. Then a fourth version appears where the story is radically re-written, and the quotes and opinions of the Opposition parties are no longer just tacked on the end, but are woven into the fabric of the story, and there are some pull-quotes from the public gathered via 'Have Your Say'.
The BBC would argue this shows how a story develops as people have time to react to it and give their response.
A counterpoint would be that this was an obvious attempt by the BBC to push the Government line when it first published a story, that it only belatedly tacked on comments from the Opposition to the story, and that in the end it had substantially re-written what it had originally posted on the front page when the news was fresh.
Having the 'diffs' on the site doesn't actually prove that argument either way, and both sides of the argument can use them to re-enforce their pre-determined opinion on the matter.
There is a parallel situation with the moderation figures published by BBC News for the 'Have Your Say' section. Prior to this feature being added, there was widespread belief on all sides of the political spectrum that the BBC refused to publish hundreds, thousands or hundreds of thousands of submissions because they did not chime with what the BBC wanted the debate to say.
The BBC's response was to show how many messages are queued for moderation, how many have been published, and how many have been rejected. The net result doesn't seem to have improved perception of the service. If a thread has 95 rejected comments, all sides naturally tend to assume that all 95 were comments supporting their particular viewpoint, that the BBC was deliberately suppressing for propagandist reasons.
It is worth noting that no other interactive news website I've reviewed over the last couple of years publishes anything like this level of information about their moderation process.
By doing so, the BBC doesn't seem to have dispelled any suspicion about the operation of Have Your Say. It doesn't seem to me that adding 'diffs' to published articles will do much to allay people's suspicions about BBC journalism either.
Interestingly enough, it is now frequently used to identify so-called 'stealth edits' of BBC content by people attempting to prove that the BBC has a left-wing bias. Yet it was originally set up by someone who wanted the information in order to demonstrate BBC the kow-towing to corporate and capitalist interests.
I'm surprised that nobody has, to my knowledge, 'mashed-up' News Sniffer with the BBC News site itself. I would have thought that someone with a bit more coding knowledge than me ought to be able to extract the URLs of the diffs from News Sniffer, match them to their parent story on BBC News, and employ a bit of Greasemonkey script magic for Firefox to insert them onto the BBC page.