The Guardian Open Platform at Endeca's e-Business Forum

 by Martin Belam, 7 May 2009

I've been posting this week about my visit to Endeca's office in Richmond for the "Endeca e-Business Forum". I went because Endeca were one of the launch partners of The Guardian's Open Platform API, and they power our internal site search engine. The Head of The Guardian's Developer Network, Matt McAlister, was giving the final presentation of the day - a case study about the Open Platform.

Matt McAlister at Endeca

The Open Platform API

The Open Platform API Explorer uses the Endeca engine to spit a load of XML onto the screen, which is lovely, comforting and beautifully well-formed for the geek-minded, but was a little bit scary for a room full of product managers and sales people.

Guardian Open Platform graphic

For that reason Matt rightly concentrated on showing examples of how it had been done, rather than the underlying code. He also showed one of the best technical architecture diagrams I have ever seen, which is apparently how he presented the concept to our technical team.

Guardian API early architecture diagram

He explained that the technical thinking was that this was very similar to search engine architecture, and so Endeca was used as the back-end of the API. The Guardian also simplified things by partnering with Mashery, meaning that even the final API architecture is still only a handful of diagram boxes.

Mashery

One of the most popular applications has been Guardian Trends, which allows the audience to compare how frequently the site has used particular words. Developer Stephen Elliott has written about it on the Open Platform blog.

Matt McAlister also mentioned that the CASS Sculpture Foundation were adding Guardian article to their sculptor biographies - here is Anish Kapoor's page

Anish Kapoor page on the Cass Sculpture website

Also name-checked were Kalv Sandhu's Tweet Reviews application, a geo-mash-up of football reports, a service that matches Guardian game reviews with trailers for the games themselves, and a Twitter search application.

Tweet Reviews screenshot

The Data Store

The geeky bits of the web were probably most excited about getting some raw XML and JSON from The Guardian's database via the API, but some of the most impressive early work has come from the data released via humble spreadsheets as part of the Data Store.

The data made available about MP expense claims led to some excellent mash-ups which revealed visually the story behind the numbers.

It meant you could easily spot MPs whose spending was out of step with their nearby colleagues.

Tom Brake MP was sufficiently disturbed by the release of the data to get in touch to offer a correction and explanation for the figures which had failed to separate his postage costs from his stationary bill.

Siôn Simon MP has found himself in a storm of local controversy after the figure revealed that despite only holding 7 surgeries in his constituency a year, he had racked up more in travel costs than a first class season ticket from Birmingham to London would have cost him.

I think these are impressive not because of the technical complexity in using the data, but because they show how much The Guardian is able to extend the reach and power of our journalism and uncover new stories by releasing it. The data about MP's expense claims was available from official channels, but in an tricky format to process. Making it available in an easy-to-use way empowers people in a way that good journalism should.

Endeca e-Business forum

Keep up to date on my new blog