That 2002 BBCi Search impartiality problem in full
In my recent post on the BBC's Internet Blog about the development of the BBC's web search engine, I mentioned in passing that in 2002 the BBC was accused of artificially inflating the rankings of BBC content within the results. James Cridland picked up on it, reminding us that searching for 'Classic FM' used to bring back BBC Radio 3 in top spot.
The original draft of my blog post went into a little more detail about this 'technical glitch', citing one of James former employers Virgin Radio as another victim, but for space reasons it ended up on the blogging equivalent of the cutting room floor. I thought I would expand upon the point over here.
It was incredibly embarrassing for the team, and I think damaging to the BBC's web search project as a whole. The launch didn't go very smoothly at all in fact. About half-hour before the new homepage with the prominent search box was due to go live the BBC's site search stopped working due to the expiration of a software licence, the TV marketing campaign had to be pulled because of offensive parallels with domestic violence, and then the ranking controversy hit the trade press.
I think it was for that reason that a rather embattled BBC Press Office issued terse "It was a technical glitch" explanations - in fact I'm not sure that the BBC ever publicly elaborated on how it came about. Certainly there was no deliberate policy of matching BBC radio stations to their commercial counterparts and ranking them higher, although of course that was how it appeared outside of Mortimer Street where we were based, but it did happen because of a 'feature', and not because of a 'bug'.
I used to talk a lot about the taxonomy under-pinning the 'best links' system behind BBC Search. It was called Bromsgrove, named after the place where our fictional target user persona lived, and it was populated initially to work over BBC content only. It mapped URLs against a tree of concepts and keywords. If you typed in 'eastenders', the software looked for a node called 'eastenders' and returned the links it found there as the top results.
Bromsgrove allowed the team to set up synonyms for search terms, and cope with people entering URLs and queries with poor spelling.
It also had a very smart piece of functionality which was called 'walking up the taxonomy tree'. If it matched a node, but the node had no URLs attached to it, Bromsgrove would 'walk up' to the parent node, and see if there were any URLs attached there.
This worked brilliantly, and looked very, very clever on the front-end. So, in 2001 on the BBC's site search, if you searched for 'Seth Johnson', Bromsgrove found the 'Seth Johnson' node, saw there was no URL, walked up the tree and found 'Derby County'. The results returned would then have a best link to BBC Sport's coverage of Derby County at #1. It gave the impression to the user that the search engine understood that Seth Johnson was a Derby County player, and also was probably the best result given the quality of some of the site search results at the time.
Then, when Seth Johnson moved to Leeds United, a quick edit of the taxonomy meant searching for
£7m misfit "Seth Johnson" would return 'Leeds United' at #1 instead - short-cutting the lag in indexing which meant that for a few weeks pages featuring him as a Derby County player would be appearing until he established himself at Leeds.
This worked really well, and so the same principle was applied to web search best links. Which is where it had the unintended consequences that were such a problem.
If you typed in 'classic fm', and the taxonomy hadn't got the Classic FM URL specified as a match for that phrase, but did have Classic FM as an empty node, Bromsgrove would recognise it was in the 'Classical Music' section of the taxonomy, and so happily recommend the nearest URL it found - BBC Radio 3. Search for 'Virgin' and it wouldn't find the actual URL for Virgin, but would find 'Virgin Radio' as a concept underneath the 'Radio' node, and so would blithely recommend the BBC Radio homepage as a URL related to the query.
Simply put, unless the BBC's web search team matched every brand name 'concept' contained within the taxonomy with a specific recommended URL, there was a risk that Bromsgrove would walk up the tree and pluck a conceptually related BBC URL from the much more comprehensive site-search mappings, and whack that in at #1 instead. What had worked brilliantly as a way of improving site search results, was a disaster when expanded beyond the scope of bbc.co.uk.
Needless to say, the feature was swiftly switched off for web search when the scale of the issue became apparent, but not before the damage had been done to the credibility of the service's claim to impartiality.
As James points out, the BBC now 'recommends' Classic FM - which presumably nowadays equally infuriates Roger Wright