That 2002 BBCi Search impartiality problem in full

 by Martin Belam, 29 November 2007

In my recent post on the BBC's Internet Blog about the development of the BBC's web search engine, I mentioned in passing that in 2002 the BBC was accused of artificially inflating the rankings of BBC content within the results. James Cridland picked up on it, reminding us that searching for 'Classic FM' used to bring back BBC Radio 3 in top spot.

The original draft of my blog post went into a little more detail about this 'technical glitch', citing one of James former employers Virgin Radio as another victim, but for space reasons it ended up on the blogging equivalent of the cutting room floor. I thought I would expand upon the point over here.

It was incredibly embarrassing for the team, and I think damaging to the BBC's web search project as a whole. The launch didn't go very smoothly at all in fact. About half-hour before the new homepage with the prominent search box was due to go live the BBC's site search stopped working due to the expiration of a software licence, the TV marketing campaign had to be pulled because of offensive parallels with domestic violence, and then the ranking controversy hit the trade press.

I think it was for that reason that a rather embattled BBC Press Office issued terse "It was a technical glitch" explanations - in fact I'm not sure that the BBC ever publicly elaborated on how it came about. Certainly there was no deliberate policy of matching BBC radio stations to their commercial counterparts and ranking them higher, although of course that was how it appeared outside of Mortimer Street where we were based, but it did happen because of a 'feature', and not because of a 'bug'.

I used to talk a lot about the taxonomy under-pinning the 'best links' system behind BBC Search. It was called Bromsgrove, named after the place where our fictional target user persona lived, and it was populated initially to work over BBC content only. It mapped URLs against a tree of concepts and keywords. If you typed in 'eastenders', the software looked for a node called 'eastenders' and returned the links it found there as the top results.

Bromsgrove allowed the team to set up synonyms for search terms, and cope with people entering URLs and queries with poor spelling.

Bromsgrove in action - synonyms for EastEnders

It also had a very smart piece of functionality which was called 'walking up the taxonomy tree'. If it matched a node, but the node had no URLs attached to it, Bromsgrove would 'walk up' to the parent node, and see if there were any URLs attached there.

Diagram of part of the BBC Search taxonomy

This worked brilliantly, and looked very, very clever on the front-end. So, in 2001 on the BBC's site search, if you searched for 'Seth Johnson', Bromsgrove found the 'Seth Johnson' node, saw there was no URL, walked up the tree and found 'Derby County'. The results returned would then have a best link to BBC Sport's coverage of Derby County at #1. It gave the impression to the user that the search engine understood that Seth Johnson was a Derby County player, and also was probably the best result given the quality of some of the site search results at the time.

Then, when Seth Johnson moved to Leeds United, a quick edit of the taxonomy meant searching for £7m misfit "Seth Johnson" would return 'Leeds United' at #1 instead - short-cutting the lag in indexing which meant that for a few weeks pages featuring him as a Derby County player would be appearing until he established himself at Leeds.

This worked really well, and so the same principle was applied to web search best links. Which is where it had the unintended consequences that were such a problem.

If you typed in 'classic fm', and the taxonomy hadn't got the Classic FM URL specified as a match for that phrase, but did have Classic FM as an empty node, Bromsgrove would recognise it was in the 'Classical Music' section of the taxonomy, and so happily recommend the nearest URL it found - BBC Radio 3. Search for 'Virgin' and it wouldn't find the actual URL for Virgin, but would find 'Virgin Radio' as a concept underneath the 'Radio' node, and so would blithely recommend the BBC Radio homepage as a URL related to the query.

Simply put, unless the BBC's web search team matched every brand name 'concept' contained within the taxonomy with a specific recommended URL, there was a risk that Bromsgrove would walk up the tree and pluck a conceptually related BBC URL from the much more comprehensive site-search mappings, and whack that in at #1 instead. What had worked brilliantly as a way of improving site search results, was a disaster when expanded beyond the scope of bbc.co.uk.

Needless to say, the feature was swiftly switched off for web search when the scale of the issue became apparent, but not before the damage had been done to the credibility of the service's claim to impartiality.

As James points out, the BBC now 'recommends' Classic FM - which presumably nowadays equally infuriates Roger Wright

BBC Search results for Classic FM

4 Comments

Are you saying that for "Bromsgrove" to work, it would have to have nodes in the tree for every possible search term?

No, not at all Ian, but it was a large taxonomy. I don't remember the exact figure but it was more than ten thousand nodes. You can see in the screenshot above that EastEnders was node 8434. It had parent nodes of Entertainment > Television > Broadcasters > BBC > BBC programmes, so there would have been nodes for Channel 4, ITV etc and the equivalent for radio, which caused this manifestation of the 'feature'. The taxonomy was designed so that if anything became topical, you could quickly add a URL to a relevant node, and get that best link into the search results within minutes, so it was pretty comprehensive. It was put together by a very thorough set of information scientists and librarians.

A bit more on the numbers - I just dug this out of the currybetdotnet archives. By the end of 2004, the BBC Governors were citing the fact that BBC Search 'incorporates 12,000 recommended sites'

Hey, thanks for that.

In the interests of balance, a search on Virgin Radio for XFM unearths the rather peculiar discovery that XFM is using Virgin Radio's social networking to promote its own station! Cheeky...

Meanwhile, "Virgin Radio recommends" also does appear in certain searches, like frequencies - mostly because the team monitored the search queries. (I did that one.)

I note that the Virgin Radio search has been significantly overhauled recently; with a ton of context-sensitive options. Good thing.

Keep up to date on my new blog