![]() |
||||||||||||||||||
| You are in currybetdotnet >> articles >> A Day In The Life Of BBCi Search | January 2003 | |||||||||||||||||
A Day In The Life Of BBCi Search
Page 1 of an 8 page article - next >>
A Day In The Life Of BBCi Search - IntroductionSince BBCi launched in November 2001, its improved search offering has been collecting data on the way that BBC website users search both the BBC's website , and through its homepage Websearch , the whole wide web.
Given such a mass of data, the easiest way to aggregate and make sense of it has been to measure the search terms that are most popular. Indeed, the BBCi homepage has a panel displaying the three most popular search terms of the moment, and an editorial and taxonomy team at the BBC constantly monitor the searches gaining high volume in order to match the correct content to them.
I therefore wanted to find out what it was that this vast majority of users were actually doing on the service, and had to find a way of analysing their behaviour without relying on our existing model of aggregating popular search terms. MethodologyOne way to go about this was to isolate one individual day, and to analyse in depth the searches that had been made. The log files collected by the search service contain information not only on the terms used, but on the time the search took place, and the area of the site that the search originated from. I chose Wednesday December 11th, as it was a weekday, during UK school terms, and there were no major breaking news stories or broadcast events to dominate results. A school term weekday is the most typical day of the year, and so the most typical use of the service - since the school calendar affects traffic to BBCi web services. I also know from experience that search behaviour is affected by large breaking news stories, for example the loss of the space shuttle Columbia, or major UK broadcast events, like Test The Nation or the launch of BBC3 . To analyse the search terms I took 10 separate 6 minute samples from the log files, at different times of day, from 1am to 10pm. This was still too much information to classify, so I reduced the information to searches that had been made from the BBCi homepage at www.bbc.co.uk , and the searches that were made from the 404 error page . These are the most context neutral pages on the site, and reduced the amount of information I had to deal with down to a considerable but manageable 15,000 search terms. I then took further 1 minute samples across the whole service to ensure that the data I was using was representative, and classified as a control sample an additional 3,000 search terms, to ensure that searches from the homepage and the 404 error page were representative of the usage of the service as a whole. I measured the search activity on the day both in quantities using perl scripts and spreadsheets, and by the hand-classification of individual search terms.
Posted by Martin Belam, January 2003
Page 1 of an 8 page article - next >>
|
|
|||||||||||||||||