A day in the life of BBCi Search - part 5

Martin Belam  by Martin Belam, 27 March 2003

Questions & URLs

Two other types of search I examined were where users had entered either natural language questions or URLs into the search box on the BBCi site. I found that although it was a regular occurrence, it was not a significant proportion of searches - URLs made up around 3% of searches, and questions just over 0.5% of searches.

The questions tended to look like essay titles, or focus on questions of interest to children. There are some areas of the BBCi site (SOS Teacher & Ask Bruce) where the input of natural language queries is encouraged. Although not using a specific natural language parsing engine, the removal of stop words like 'how' and 'why' allows the websearch technology to return relevant results to these queries.

More importantly, study of the questions entered by users in these areas of the site also informs the content creation process. In this way the activity of the users contributes directly to the improvement of the service, as their requests for information shape the nature of the information subsequently provided.

Searches for URLs tended to be variations on BBC web addresses, and the URLs of high profile websites outside of the BBC. In the case of the latter it seems that people are using the BBCi Web Search offering to navigate to other sites (e.g. Friends Reunited or Hotmail), which are constantly near the top of reports on the URLs that have been entered.

For variations on BBC addresses, again the BBCi Search team has used this feedback on search behaviour to set up synonyms, so that users typing in www.eastenders.co.uk will get to the BBCi EastEnders homepage, providing a better and more relevant result than a search technology could by itself if strictly looking for pages with the text 'www.eastenders.co.uk' on the page

An illustration of how URLs, set as synonyms, can bring back the correct result to the user for their search

Word Count

I also measured the different number of words users on the BBCi site employed when making searches.

I found that 36% of searches consisted of just one word and 35% of searches used just two words. This is a vital point. Given the opportunity of searching over the whole of the BBC site, or indeed the whole of the web, the user's understanding or trust of search technology is such that they believe that a limited one or two word search term will achieve their goals. When we consider that the quantity of documents indexed for websearch is counted in the thousands of millions, this is a formidible task.

It is another reinforcement of the need for human intervention in search technology - to maximise the chances of these searches getting the right or relevant results. The BBCi Search editorial team are able to ensure that one word searches for "travel" or "sport" will get a top return of the best UK-based travel website, or the BBC Sport site, rather than a search result return based on the frequency of the appearance of those words within a document, or the number of links pointing at them.

Of the remaining searches, 16% contained 3 words, 7% contained 4 words, 3% contained 5 words, and the remaining 3% consisted of six or more words.

Pie-chart showing the different number of words uses in searches across the BBCi service

In part six of this article I will be looking at how the BBC uses a taxonomy to assist the user in finding what they are looking for.

Keep up to date on my new blog