What difference does it make? Adding Wikipedia style 'diffs' to BBC News

 by Martin Belam, 3 March 2008

I was giving some thought to the frequent requests that the BBC's news site make its editing more transparent and accountable by retaining each version of an article on the site, a 'diff' if you like, in the style of Wikipedia.

Kraftwerk 2 diff page on Wikipedia

On the face of it, it seems like a simple feature to implement, with some obvious advantages, but I've also been thinking about some of the practical problems it might present. As always seems to be the case with web technology, in solving one user experience problem, you often unintentionally create a couple of others.

Looking at it as a feature request, it seems straightforward enough:

Store a version of every published state of every page of the BBC News site, and link to it from the current version.
BBC News with added diff links

However, not all content on the site is created equal, and not all edits fall into the same category.

What about when a story isn't edited, but a new pull-quote or stats box is added to the flow of the page. Would that require a new published 'diff' there? There seems to be a good case for saying you should.

What about the minute-by-minute coverage of a day's play at a Test Match or live text for a football game? The content is generated in the same CMS, uses the same templates as a news story, and appears on the same site. So, would you publish and link to the 150+ diffs of that page? I'd argue that isn't particularly helpful to the user, or necessary.

Live text commentary with 161 diff links

So, what about the stories which ask for user's to submit comments and eyewitness accounts, and then progressively add these at the foot of the page? Again, I'd think it was reasonable to publish a 'diff' in this instance, as it highlights the way the reporting and understanding of a story has developed.

But hang on a second - if you are doing that, should you make a diff for every time a new comment is published on a 'Have Your Say' or '606' thread? In which case, for those pages, you would end up with potentially thousands of 'diffs'. What use would that actually be?

What about the News blogs? Blogs tend, by their nature, to be thrown together more rapidly than considered reporting, and so are more prone to typos.

(Well, that's my defence for any typos on currybetdotnet, anyway!)

In order to make the system credible, the BBC would need to record every single edit on every single page, with a timestamp, for whatever the reasons. On some pages and some types of content though, this would present a usability issue, and serve little purpose. However, if the BBC were to start picking and choosing which stories got the 'diff' treatment, they would leave themselves open to accusations, and the temptation, to use the technology selectively. And who measures the 'diffs' of the 'diffs'?

The system would also pose some problems for search. If, for example, the phrase 'neutron flow polarity' appeared in the first verion of an article, and then not in any subsequent version of a story, would you expect that version of the page to appear in the results of a search for 'neutron flow polarity'?

Jon Pertwee brandishes the sonic screwdriver

The answer is, I think, almost certainly yes, you would want that result to come back.

Now, suppose the phrase was rather more common on the site, for example a search for "Christopher Eccleston". His name would appear not just in a lot of stories on the site, but also, potentially, in lots of 'diffs' of stories on the site.

You would need to do some quite smart engineering work to ensure that if a phrase was unique to a diff, then that diff appeared in search results, but that if the phrase appeared in various versions of a story, only the latest version was shown. You'd have to do this to avoid huge amounts of duplication in the results. Not impossible, but not a functionality that most search engine technologies have to worry about.

That's not to say that I think it is necessarily a bad idea to archive the different versions of stories as they are published. I think there would be some obvious gains for the BBC, and not just in terms of increasing the transparency around what is published.

It would be a fantastic resource for journalism students, for example, after events like the July 2005 bombings in London, to be able to flick back through and study how the reporting of the news developed on the day. At the moment the day has to be pieced back together from the final 'In depth' coverage, Flickr collections put together as it happened, and the odd screenshot occasionally used around the BBC site.

Early reports of power surges on the London Underground

That, though, is an exceptional example. I remember one of the technical architects at the BBC putting together a quick backstage.bbc.co.uk prototype that checked every single edit on the BBC News site, and produced a stream of output that was incredibly dull, as it mostly consisted of fixing typographical errors.

With a 'diff' system in place, you might, as a side effect, also see an improvement in the quality of journalistic output. The rush to press publish may be slightly tempered by the knowledge that the first live version of an article is going to stay on the site, rather than vanish completely in ten minutes time when the writer has seen it on the live site and spotted a couple of extra typos.

However, for me it essentially boils down to this. The solution is to present lots of versions of a story to show how it was edited over time - but what was the problem you were trying to solve? I think, fundamentally, the issue here is one of trust with the BBC as a news source, not an issue of problems with the content management system it employs.

Put simply, the large number of people who still say in surveys that they trust the BBC's reporting are not clamouring for a system that allows them to view slightly out-of-date versions of the news. They work on the assumption that the version they are reading is the most accurate and up-to-date information that the BBC has.

The desire for such a system comes from people who don't think that is the case, and the question would have to be if people don't trust the BBC's reporting, would they trust any 'diff' system the BBC put in place?

A classic argument with BBC reporting is with the inclusion of responses to a developing story by the differing political parties in the UK.

A 'diff' system would probably reveal on a consistent basis that a story breaks on the site about a political initiative from the Government. A second version appears a little later which includes a quote from an Opposition spokesperson. A third version then appears with something from the LibDems and perhaps also something from a lobby group / minor party / celebrity / other interested party. Then a fourth version appears where the story is radically re-written, and the quotes and opinions of the Opposition parties are no longer just tacked on the end, but are woven into the fabric of the story, and there are some pull-quotes from the public gathered via 'Have Your Say'.

The BBC would argue this shows how a story develops as people have time to react to it and give their response.

A counterpoint would be that this was an obvious attempt by the BBC to push the Government line when it first published a story, that it only belatedly tacked on comments from the Opposition to the story, and that in the end it had substantially re-written what it had originally posted on the front page when the news was fresh.

Having the 'diffs' on the site doesn't actually prove that argument either way, and both sides of the argument can use them to re-enforce their pre-determined opinion on the matter.

There is a parallel situation with the moderation figures published by BBC News for the 'Have Your Say' section. Prior to this feature being added, there was widespread belief on all sides of the political spectrum that the BBC refused to publish hundreds, thousands or hundreds of thousands of submissions because they did not chime with what the BBC wanted the debate to say.

The BBC's response was to show how many messages are queued for moderation, how many have been published, and how many have been rejected. The net result doesn't seem to have improved perception of the service. If a thread has 95 rejected comments, all sides naturally tend to assume that all 95 were comments supporting their particular viewpoint, that the BBC was deliberately suppressing for propagandist reasons.

BBC moderation stats

It is worth noting that no other interactive news website I've reviewed over the last couple of years publishes anything like this level of information about their moderation process.

By doing so, the BBC doesn't seem to have dispelled any suspicion about the operation of Have Your Say. It doesn't seem to me that adding 'diffs' to published articles will do much to allay people's suspicions about BBC journalism either.

That is one reason that I think in many ways it is better for services like Revisionista on News Sniffer to audit the BBC's editing from outside of the Corporation's firewall.

Interestingly enough, it is now frequently used to identify so-called 'stealth edits' of BBC content by people attempting to prove that the BBC has a left-wing bias. Yet it was originally set up by someone who wanted the information in order to demonstrate BBC the kow-towing to corporate and capitalist interests.

I'm surprised that nobody has, to my knowledge, 'mashed-up' News Sniffer with the BBC News site itself. I would have thought that someone with a bit more coding knowledge than me ought to be able to extract the URLs of the diffs from News Sniffer, match them to their parent story on BBC News, and employ a bit of Greasemonkey script magic for Firefox to insert them onto the BBC page.

BBC News with added News Sniffer Revisionista


An interesting post, but you mar it with a few straw man arguments.

First up is a mockup of how 150+ plus links to diff would look crazy on one page. This is just a UI issue though, and one easily solveable, so is hardly a reason for not keeping a record of the diffs.

Next up is an argument suggesting that the numerous versions would clog up search results. Again, easily solveable by not indexing old versions.

Lastly, the suggestion that you'd have to keep diffs for the HYS pages, which are dynamic and where each comment has its own timestamp anyway (and presumably aren't edited once they're up anyway).

The argument as to whether providing a version history would placate those that decry the BBC is biased is a good an interesting one. I agree that it probably wouldn't help much.

However, on the question of whether keeping a public version history would be useful and interesting in and of itself, I think we'd all agree that it would be.

And sure, lots of the changes would be minor ones, but then it wouldn't be that hard to provide a 'minor' checkbox that editors can check when they fix a typo, much in the way of Wikipedia. (This wouldn't be much extra work, and I think they already do something similar anyway).

Sorry Frankie, I thought it was quite obvious that the mock-up of the diffs on the Premier League score was illustrative of how many diffs you might end up with for a single page, rather than a serious propsition for how the finished UI might look!

More seriously, I think you've totally misunderstood my point about search results. If you simply throw all the old diffs out of the index as you suggest, then how is anybody going to actually find the references to things that have been deleted? And if you can't retrieve them via search, then what is the point of having them at all?

That is leaving aside the fact that the BBC can control how the 'diffs' are indexed and returned on bbc.co.uk, but can do nothing about how Google et al might treat the content.

You've got some interesting points, but honestly you've completely misunderstood the reasons for having diffs.

The most practical reason to have diff is for readers to easily see how stories have changed since the last time they read the news item, transparency is a side benefit. Rather than re-reading the whole news item, they could simply look at the diffs between the current version and the version s/he last read. Currently, people never re-reads an article because it is not possible to see what have changed since then.

Second, the mockup is just ridiculous. The page need only show the last 3 major versions (or 5 or 10 or whatever), and put the others in a "more" link.

Rather than selecting which pages gets `diff` treatment, it is much more useful to have a major and minor change (as in Wikipedia), and have minor changes hidden by default.

Search is never a problem, if a phrase existing in old version is removed in the newest version, then it means the keyword is not that important for the specific article and the index need not to relate that keyword to the article. If a certain keyword is important for the article, then its removal should be questioned.

That is leaving aside the fact that the BBC can control how the 'diffs' are indexed and returned on bbc.co.uk, but can do nothing about how Google et al might treat the content.

Ever heard of robots.txt?

All in all, all the points in the article is too crude.

Keep up to date on my new blog