A day in the life of BBCi Search - part 6

Martin Belam
Written by
Published 27 March, 2003
Categories: ,

<< previous | next >>
No comments yet 
Add your comment Add your comment

The value of a Taxonomy

As I mentioned, BBCi Search has a team constantly monitoring the search activity on the site, and attempting to match the searches being made with the best possible content available, both on the BBCi site and on the web as a whole.

This role is crucial for a site the size of bbc.co.uk with an index which consists of in excess of 2,500,000 documents, without including the BBC News and BBC Sport content. It is the only way that the language of the users can be mapped to the taxonomical conventions of the organisation.

One example of this is users searching on the BBCi Science site for information on 'planets'. A examination of the search terms used on the site shows that 'planets' is consistently one of the most used search terms. However the BBCi Science homepage does not feature the word 'planets' at all. The site has plenty of contents about the solar system, but it is described as 'solar system', and branded "Space", to tie in with a television programme broadcast some 18 months ago.

The consequence of this is that a search for 'planets' that relies purely on a technological word matching solution returns as it top results information about "The Blue Planet" television programme - ironically probably the one planet in the solar system the user was least likely to be wanting information about.

In the absence of search technology with a better semantic understanding of the English language, the only way to align the vocabulary of the site with the vocabulary of users is to intervene, by providing 'best bet' results that originate form a taxonomical mapping of the content of the BBC site. It is only a human who can look at the that search, within that context, and decide that it equates to an individual piece of web content that the search technology would otherwise fail to return.

Another strong advantage of this system is the ability of the editorial team and taxonomists to assign new synonyms, best bet URLs, or change descriptions in real-time, in response to the actual recorded user behaviour.

A recent example of this was with the loss of the NASA Space Shuttle Columbia. The BBCi Search results pages include a news headline feed, if the query produces results from the BBC News or BBC Sport site that have been published within the last three days and cross a specific relevancy threshold. This worked fine if users were searching for "space shuttle" or "Columbia".

However we also saw, within three hours of the accident, that there had been a considerable rise in searches for the country "Colombia". Whilst it was conceivable that there was a simultaneous breaking news story about in Colombia, it was obvious that these were searches aimed at finding information about the space shuttle from users who were unaware of the correct name.

The result set they received was about the country, and did not produce any headlines about the space shuttle. Through the use of synonyming we were able to provide a result set that contained links to the latest news stories about the shuttle, even when people were unintentionally searching for the country. Again this is something that would be impossible with a reliance on technology alone.

No comments yet
Leave your comment


Alan Turing wouldn't be impressed with this crude test,
but please prove you are a person and type toothpaste into this box:
  

A limited set of HTML tags are allowed in comments: a href, strong, em, ul, li, blockquote
To protect against spam your comments will not appear on the site until I have manually published them.
* Your email address will never appear on the site.

Search

Subscribe

Subscribe via email or RSS RSS icon
Get updates to currybetdotnet sent to you via email

About Martin Belam

I'm an Internet consultant and writer, with 8 years experience in product management, information architecture, and user experience design for global brands like Sony, Vodafone, The Guardian and the BBC. I specialise in advising on search, widgets, RSS, online news publishing and bulk email delivery.
Martin Belam CV
email: martin.belam@currybet.net
tel: +44 (0) 7801 828718
About Martin Belam and this site

Popular categories

BBC, Doctor Who, Ghost Walks, Media, Music, Newspapers, Search, Web

See all Categories