"Search at The Guardian" - London Enterprise Search meet-up

 by Martin Belam, 19 October 2010

Last night at Kings Place I helped Tyler Tate put on the latest London Enterprise Search meet-up, which we called "Search at The Guardian". We had 7 people speaking, and I was really pleased that we covered a range of search related topics, with talks from people with varying roles at GNM.

The Guardian's Kings Place office at night

Stephen Dunn opened with an overview of how search has changed over the years for us, from the days of a Verity install that nobody really understood, and through a long partnership with Endeca. Stephen explained that when thinking about a rebuild of the site in 2003, the team were impressed with the way that eBay provided 'guided' navigation, and so chose Endeca to provide facet-style search of Guardian content. That in turn influenced the way that we built our in-house R2 CMS, and how it allows us to tag and classify content.

The Guardian's tagging data scheme

We've recently moved to using a new search technology, the open source Solr software. Mat Wall and Graham Tackley explained how it underpins our Open Platform Content API. If you look at the API Explorer, you'll see it resembles an 'Advanced Search' form of yore. The way that partners query our API is essentially by constructing complex search queries that allow them to filter down to the exact content they need.

Querying the Guardian Open Platform API with a Musicbrainz ID

On Monday we announced the latest addition to our API, the ability to query Guardian and Observer content via Linked Data MusicBrainz IDs and via ISBN numbers. Daithí Ó Crualaoich has done much of the work on this, and as well as blogging about it yesterday, he spoke about it last night.

There were also talks from Thibaut Sacreste and Philip Wills on some of the more technical aspects of using Solr. Thibaut was showing how Solr implements queries with a geographical angle, and Philip was looking at some of the maths functions that the software offers. They had slides that featured real code, which was a little bit intimidating for a 'token techie' like me.

I then rounded off things with a rather fluffier presentation on "Why news search is broken...and what you can do about it".

My basic premise is twofold. Firstly, that search engines index documents, but that users on news sites don't care about documents, they mostly care about finding a specific news story which they already have in mind. Nobody searched for 'chile miners trapped underground' until it happened.

Secondly, that as a content organisation, you are never going to out-spend, out-hire or out-think Google on search. Therefore you should focus on doing the things that Google can't do so easily - like display precise metadata and provide decent filtering mechanisms based around what you know about your content structure that Google can't deduce easily from crawling.

Martin Belam presenting in the Scott Room at Kings Place

Martin Belam presenting in the Scott Room at Kings Place. Photograph by Tyler Tate.

In 2008 I gave a longer presentation at EuroIA that also touched on this theme called "Taking the 'Ooh' out of Google: Getting site search right". You can download that as a PDF or read it on the blog.

Diagram: Make Links

The evening was heavily over-subscribed - at one point we had 60 people with places, yet still another 49 on the waiting list. I hope that the people who attended enjoyed it, and got something from the evening. I'd like to thank Matt McAlister and the Open Platform team who put up the money for us to have a bit of food and drink on the night, Tyler Tate and Stefan Olafsson for running the London Enterprise Search meet-up group, and especially my colleagues who took the time out for an evening to come and talk about their work.

1 Comment

Thanks for a great event Martin - lots of great stuff in there - it's always good to get a peek inside an organisation.

Keep up to date on my new blog