"Filtering" user-generated content on the BBC News site
Late last week someone left a comment on one of my old pieces about the BBC's moderation policies, asking my opinion on some of the things mentioned in Peter Horrocks' recent speech about user-generated content and citizen journalism. Bryan has left comments on currybetdotnet before, and we've exchanged email. We don't agree on everything by a long chalk, but he's always been very pleasant about it, as I hope I have.
I started writing a response, but once it got more involved I realised it was probably worth putting up as a new post, rather than just tacking it onto the end of an article that is nearly a year old now.
For context, here is Bryan's comment:
You might be interested to know that The Editors blog is still fumbling and stumbling in its apparent attempts to sort out the 'technical' problems that have plagued it pretty much since its inception. I put 'technical' in quotes since I just cannot see how the problems could possibly drag on for months turning into years if they were simply technical in nature and I have long suspected that the BBC was actually trying to filter out 'undesirable' comments and commentators - not because they break any rules, but because they are not PC. Paranoia? Conspiracy theory? Possibly. But have a look at this:
'We need to be able to extract real editorial value from such contributions more easily. We are exploring as many technological solutions as we can for filtering the content, looking for intelligent software that can help journalists find the nuggets and ways in which the audience itself can help us to cope with the volume and sift it'.
It's from Peter Horrock's recent Value of citizen journalism post.
As someone who is clued up about these matters, perhaps you can indicate what this filtering process involves?
Over the weekend following his article, I tried to send Mr. Horrocks a lengthy comment, in two parts. It wouldn't go through, just hung. So I sent it on the Monday and it went through OK but was not published, perhaps proving my point. So I published it on the breakaway blog from Biased BBC
Any thoughts you have on this matter would be much appreciated".
Without doing a point-by-point analysis, I was struck by three things that I thought worth mentioning.
The first is on the issue of the technical problems with the BBC's blogs. Now, I'm a big fan of applying Occam's Razor. It could be that the BBC didn't implement their blog platform with enough forward capacity. Or it could be that the BBC implemented their blog platform with enough capacity, but then make it run badly in order to control the types of comments posted. I know which seems more likely to me.
Robin Hamman wrote a lengthy post about the issues they have had, as Bryan rightly points out, pretty much from the moment the blogs achieved any kind of popularity. It appears that when the BBC heavily customised the version of Movable Type they were using, they did so in a way that has prevented them going on to implement some of the usual defence measures against spam.
I can sympathise with that. Even on a minor personal blog like currybetdotnet, wave after wave of spam aimed at the Movable Type comment system caused my server to max out repeatedly and database to crash, and I've got a fraction of the audience of the BBC's blogs. I've been able to try different solutions and play around with my installation - and have got it much more stable now, at the expense of people having to put in a codeword to submit a comment.
The BBC doesn't tend to play fast'n'loose with live systems like I can on my own. Plus I can pretty much guarantee that at the project prioritisation meeting where #1 on the agenda is getting the iPlayer launched before Xmas in time for the marketing, and #19 on the agenda is having a look at the blog servers, the resources stop well before item #19 gets reached.
Effectively the BBC's blog platform is hostage to some ill-starred technical decisions early on. It is anecdotal evidence I know, but it isn't just the general public trying to post comments who are affected. On one of my articles last year for the BBC Internet Blog, James Cridland and I had to continue a discussion about a post on our own sites, since neither of us could manage to get a comment submitted and published. He's BBC staff and I was trying to reply to an article I had authored!
[UPDATED: James points out in the comments and on this post on his blog that this wasn't the case. I appear to have conflated our discussion about the BBCi Search impartiality 'glitch' with attempting to reply to some comments on the BBC Internet Blog by Andrew Bowden and Lizzie Jackson. My apologies]
The second point I was interested in taking up is Horrocks' use of the word 'filtering'. I feel, reading through his speech again, that two separate issues have become conflated in the minds of some people. He opens his talk discussing the internal debate in the BBC about the audience reaction to Bhutto's death. The BBC considered switching off Have Your Say. Horrocks then goes on to talk about content 'filtering', and I bleieve that some people have assumed that this 'filtering' is about what appears on the site in public.
I'm not entirely convinced that is what Horrocks was saying. I believe that he is talking about using 'filtering' to improve the value that user-generated content brings to the newsgathering and news production process.
To use an example entirely divorced from politics and religion, so we don't get sidetracked by any unfortunate use of phrase on my part, think about the final day of the football transfer window. The BBC encourages user's of 606 to text and message with their sightings of footballers at training grounds, airports, taxi ranks and football club, to give the day a sense of fun and urgency. The BBC gets thousands of messages on the topic, mostly consisting of banter and wishful thinking.
But somewhere in there may be the one genuine first sighting of Jermaine Defoe with a lawyer who someone knows represents Portsmouth. How does the BBC get that information out of the big bucket of user-generated content coming in, most of which is inconsequential fluff, and into the hands of Radio Five Live's football correspondents as an exclusive.
To go back to the Bhutto example Horrocks cited, I think what he meant was that in amongst the 15,000 comments were statements from potential eye-witnesses to the assassination, and people from Bhutto's life. He wants to find a way, and technology is one avenue worth exploring, that this content can be put quickly into the hands of the newsgatherers, rather than just be one of 15,000 messages looked at by poorly paid part-time moderators.
I don't believe that Horrocks was suggesting that stuff should be filtered off the site, more that the process of getting the good stuff into the hands of journalists for follow-up needs to be faster.
What sort of format might such a filtering technology take? Well, there are a couple of reasonably straightforward steps that could be employed. One would be to think of a spam filter in reverse.
Spam filtering works by recognising words and phrases like 'viagra', 'casino' and 'I am the widow of', and trapping those messages. To look for eyewitness accounts, for example, you might filter in reverse for phrases like 'I saw' and 'We heard', or things that indicate proximity like 'We were n metres away', and highlight those.
You could also run some IP filtering on your comment submissions. If you were looking for eyewitness counts of a train-crash in Greece, you could pretty much discount anything where the IP address originated outside of that country. As a result you would very likely cut down the number of comments that journalists had to look through in order to tease out the 'hidden gems' Horrocks believes are in there.
That isn't to say that the BBC won't do less of 'Have Your Say' in the future. I would have thought that there must be some senior level editorial consideration given to the fact that user-generated content can have a big impact on how a site or organisation is perceived.
Whilst I'm a big champion of news sites utilising user-generated content, and think that some people have done it really well, I happen to agree with Horrocks that reading through a lot of the stuff published on 'Have Your Say' is monotonous and unhelpful to people's understanding of an issue.
At least Speak Your Branes means I don't have to read it much myself anymore to find the amusing stuff. When I do look, I don't particularly find the posts from users to be particularly skewed one way or another. The very first comment on Horrocks' posts suggest HYS has been hijacked by the BNP, but I know very well that the general experience of people on Biased BBC is that it is hard to get their comments published, because, they assume, they don't toe the BBC party line. I tend to see what I would expect on an Internet forum, a large mix of very strongly held views shouting each other down as idiots. It isn't quite the viper's nest of 'Comment Is Free', but it is getting there.
I wanted to pick up a third and final point from Horrocks' speech, or at least the reaction to it. I notice some comment has been made about his use of the word 'agenda':
"We cannot just take the views that we receive via e-mails and texts and let them dictate our agenda".
As I've said before, one of the things that I often thinks lets down those who believe that the BBC demonstrates bias is a lack of precise focus. There are some really interesting threads in Horrocks' speech, but noting his use of the word agenda as evidence of BBC bias is almost beyond trivial. Of course the Today programme and the 10 O'clock news have an agenda.
If they didn't have an agenda they would be dead air.
It seems, frankly, self-evident that Horrocks is talking about the slate of stories and running orders that go into the BBC's bulletins, rather than an 'uber-reaching ideological agenda'. I suspect you could only believe he meant the latter if you'd never worked in news broadcasting, and already believed it to be true anyway.
The overall impression I got from Horrocks' speech is that there are still areas of the BBC unsure of the value that all this user-generated content adds to the BBC News site. It seems to me quite clear that the BBC couldn't go back to the old days of the 'email-and-publish' model. But it seems equally clear that publishing thousands upon thousands of user comments that hardly anybody reads doesn't make for a particularly compelling user experience.
Striking the balance between giving people full participation, producing a quality readable site, and getting consistent journalistic worth out of user-submitted content is, I think, something that no online news site I've reviewed in the last couple of years has yet got right.