Why are the UK & US so far ahead with linked data and the semantic web compared to Germany?

Martin Belam by Martin Belam, 2 September 2010

I've just come back from a really enjoyable 1st Datajournalism meetup in Berlin, which will no doubt generate a spew of blog posts here and on guardian.co.uk over the next couple of weeks. I was giving a talk entitled 'Datajournalism at guardian.co.uk', which I intend to publish in some format somewhere at some point in the not too distant future - in the meantime there is a list of the things that I referenced in the talk in yesterday's linklog special edition.

The afternoon closed with a panel discussion, featuring myself (representing Guardian News & Media) and Tom Scott, Jem Rayfield and Silver Oliver from the BBC, ex-LA Times journalist Eric Ulken, Gerd Kamp of the Deutsche Presse Agentur Newslab and Ole Wintermann from the Bertelsmann Foundation.

Panel session at the 1st Berlin Datajournalism meetup

From left to right: Ole Wintermann, Jem Rayfield, Silver Oliver, Gerd Kamp, Eric Ulken, Tom Scott, Martin Belam. Photo: Georgi Kobilarov

One of the themes that emerged was that the Berlin audience felt that the US and UK had taken a lead in the linked data and semantic web field. As one audience question put it:

"All the presentations from the British and US speakers seemed to be about we did this thing, and we did this thing. All the presentations from Germany seemed to be about how we struggled to do anything. Why is that?"

I'm not entirely convinced that we came up with a very good answer, but there seemed to be four main strands to it.

The common language

Dull but true - the common language between the UK and the US has made it easier to build things in English. The English language version of Wikipedia is 3 times as big as the German one, and dbpedia, a crucial hub in the linked data ecosphere, uses English Wikipedia entries as the basis of a common identifier.

The use of the English language also drives the scope of some of the services being built. It makes sense for The Guardian's World Government Data store to include datasets from Australia, Canada, New Zealand and the US alongside the UK because they share a common language which makes data retrieval and cross-referencing and comparison easy. Whether it would be useful to add in state published data from a German Landtag or the Austrian government remains to be seen.

'The unnoficial API and data arms race'

Having a common language also means sharing a common media space. In the UK we've made some tentative steps to forge a common vision of linked data use by news organisations. However, no doubt dubbed with some hideous neologism like 'co-op-a-tition', the fact that businesses like The Guardian, The Telegraph, the LA Times and the New York Times are all feverishly looking across the Atlantic at what the others are doing drives innovation and new data-driven journalism services.

The BBC has done it

With BBC Earth and the 2010 World Cup site, the BBC still remains alone as a big media organisation that has used semantic web technologies on the production side, and has then been open in blogging and presenting about what they have done. It provides a demonstration of a potential business case to other media organisations in the UK. The size and funding model of the BBC has allowed it the opportunity to experiment in this sphere and carefully build things 'the right way'.

The history of 'freedom of information' court cases

Eric Ulken made the point that he was loathe to say it as an American, but that maybe more Europeans needed to reach for their nearest lawyer. He stressed that the open government data and freedom of information legislation in the US has come about after years of lawsuits trying to force grudging state and federal departments to release information. The plethora of official data being issued in the UK and US in machine-readable and reusable formats is fueling the development of apps and services.

So what should Germany do next?

Between us on the panel we suggested a few things that might help kick-start the process:

  • Get together hacks and hackers meetups like those taking place in the US and UK
  • Support and publicise existing sites and services like datenjournalist.de and the DPA Newslab
  • Have more 'Web of data' meetups in Berlin and elsewhere
  • Start campaigning for the release of local and national state information

Next...

As I mentioned at the beginning, I'm sure this is just the first of a flurry of blog posts to emerge out of my trip...

4 Comments

Interesting, but I'd note some other distinguishing features that probably influence this. We should not rush to assume structural advantages that other nations can't use.

(a) US has had a big, thriving federal government interest group of in-house KM/semantic civil servants for at least a decade. Lots of alumni to draw on. (b) US copyright law *disables* some federal government sources from claiming exclusive rights to its own work. Made by public dollars, so available to public. (c) UK has a celebrity to work with! Two, counting Cameron as well as Berners-Lee. Marquee advocacy absolutely matters. (d) Both countries arguably have more privitisation: thus industry-based innovations get into government practices faster.

Finally, let me just express polite skepticism that the status of English as the 21st century lingua franca gives the EN-speakers a leg up. Germany, for example, has *plenty* of raw e-gov material, ready for semantic markup .. in German. Sorry, not sure that the size of en.wikipedia.org versus de.wikipedia.org is a gating factor here.

I think it might have a lot to do with the freedom and ability of the people to gather in the US and UK. Unlike those countries Germany had a wall that really hurt the country in ways that we just can't comprehend. When I lived in Eastern Europe I would never imagine something like a 2600 meeting to take place there and actually get broadcast on the radio as it does in New York. There also might not have been as many incentives for the government to invest in the net.

Hi Martin

I was supposed to be at the Data Meetup but couldn't make it due to a last minute staffing shortage in the office.

One thing that's struck me - as ex-Guardian Unlimited for seven years, at now in Berlin at dpa - is that for a city full of (unemployed) graphic designers, artists, web programmers, students, journalists etc - Berlin is surprisingly backward in terms of digital presence, from online shopping, to online banking to online newspapers to bloggers, small businesses, etc. I'd say it's five years plus behind the curve of where London is, at least.

Secondly, without labouring the point of 'Prussian Perfectionism' too much, the beta/Web 2.0 doesn't seem to have much traction over here. The amount of blogging type events I've been to where people secretly 'unveil' their website, which they've been working on for 9 months/2 years etc, and say 'as soon as the picture archiving is ready and more navigable we'll be ready to launch, maybe next year' etc is staggering. I've been looked at with incomprehension when suggesting a soft launch and inviting comments from users.

Thirdly - unrelated - the point I wanted to put at the Meetup was my perception that a lot of data journalism is programmer/researcher led, rather than editorially/journalistically led. It might be an obvious point - I'm sure it's been answered - but who want to read 80,000+ pages of Afghan Wikileaks? Rather than have a journalist/team go through it, and turn it into a story? Albeit a story with links/navigation/data mining/research implicit. That's a bit broad brush, and I'm playing devil's advocate to a degree, but you get my point. Some of this data journalism is essentially university research tools.

Matt

Keep up to date on my new blog