A day in the life of BBCi Search - part 6

Martin Belam
Written by
Published 27 March, 2003
Categories: ,

<< previous | next >>
No comments yet
Add your comment

The value of a Taxonomy

As I mentioned, BBCi Search has a team constantly monitoring the search activity on the site, and attempting to match the searches being made with the best possible content available, both on the BBCi site and on the web as a whole.

This role is crucial for a site the size of bbc.co.uk with an index which consists of in excess of 2,500,000 documents, without including the BBC News and BBC Sport content. It is the only way that the language of the users can be mapped to the taxonomical conventions of the organisation.

One example of this is users searching on the BBCi Science site for information on 'planets'. A examination of the search terms used on the site shows that 'planets' is consistently one of the most used search terms. However the BBCi Science homepage does not feature the word 'planets' at all. The site has plenty of contents about the solar system, but it is described as 'solar system', and branded "Space", to tie in with a television programme broadcast some 18 months ago.

The consequence of this is that a search for 'planets' that relies purely on a technological word matching solution returns as it top results information about "The Blue Planet" television programme - ironically probably the one planet in the solar system the user was least likely to be wanting information about.

In the absence of search technology with a better semantic understanding of the English language, the only way to align the vocabulary of the site with the vocabulary of users is to intervene, by providing 'best bet' results that originate form a taxonomical mapping of the content of the BBC site. It is only a human who can look at the that search, within that context, and decide that it equates to an individual piece of web content that the search technology would otherwise fail to return.

Another strong advantage of this system is the ability of the editorial team and taxonomists to assign new synonyms, best bet URLs, or change descriptions in real-time, in response to the actual recorded user behaviour.

A recent example of this was with the loss of the NASA Space Shuttle Columbia. The BBCi Search results pages include a news headline feed, if the query produces results from the BBC News or BBC Sport site that have been published within the last three days and cross a specific relevancy threshold. This worked fine if users were searching for "space shuttle" or "Columbia".

However we also saw, within three hours of the accident, that there had been a considerable rise in searches for the country "Colombia". Whilst it was conceivable that there was a simultaneous breaking news story about in Colombia, it was obvious that these were searches aimed at finding information about the space shuttle from users who were unaware of the correct name.

The result set they received was about the country, and did not produce any headlines about the space shuttle. Through the use of synonyming we were able to provide a result set that contained links to the latest news stories about the shuttle, even when people were unintentionally searching for the country. Again this is something that would be impossible with a reliance on technology alone.

No comments yet
Comments are closed across the site whilst I take a break. You can still contact me directly.

A limited set of HTML tags are allowed in comments: a href, strong, em, ul, li, blockquote
Your comments will not appear on the site until I have pre-moderated them.
Your email address will never appear on the site.
To get a picture icon that will appear here, and on many other sites, please visit Gravatar

  

  

  


Alan Turing wouldn't be impressed with this crude test, but please prove you are a person and type toothpaste into the box below.



-->

Search this site

Get free updates

Email icon   RSS icon
Sign up for email updates
  

Talks & presentations


Edinburgh International Science Festival

"Journalism in the digital age"
I'll be appearing on a panel with Sarah Hartley and Iain Hepburn at the Edinburgh International Science Festival on Sunday April 11th. More details...

Posts of the moment


Day of the Triffids opening sequence

Day of the Triffids
If everyone suddenly went blind, how long would the Internet survive, and could you still publish news on it?


The Express makes a twit of itself

With professionals of this quality, who needs 'citizen journalist' enemies?
It is hard to argue that ethics and quality set the 'professional journalist' apart from the amateur blogger, if the 'professional' keeps publishing articles so wrong that they have to be deleted.

Read more about...

Also on the site