'Linked Data and the future of journalism' - part 2

 by Martin Belam, 15 September 2009

Yesterday I published the first part of a rough transcript of the things that I said at the London Linked Data meet-up. I was part of a panel chaired by Paul Bradshaw. We were discussing the future of data and journalism with John O'Donovan, Dan Brickley and Leigh Dodds. The views expressed are my own, and not the views of Guardian News & Media where I work.

View from the stage at the London Linked Data meet-up

The view from the stage at the London Linked Data meet-up

Martin Belam on 'Linked Data and the future of journalism'

It is clear that the MPs Expenses scandal was a watershed moment for data-driven journalism. Despite the fact that The Telegraph and The Guardian can be bitter rivals in terms of the business and the ideology of the papers, I have to take my hat to them. I can't remember in my lifetime a single paper driving the political agenda of the nation as they did day after day, whilst putting on physical circulation.

Nevertheless, as I've said before, I think The Guardian's instinct to publish documents and data in as full a manner as possible and try to crowd-source stories, compared to The Telegraph's bunker of journalists dripping the revelations out, shows two different cultures of data journalism at work. I think they'll both have their place in the future - being open and linked in, and being shuttered up for commercial advantage.

One thing that becomes clear in a discussion like this is that we quickly start talking about fact-checking and trust and showing sources. One question from the floor suggested that linking to specific scientific studies rather than saying "scientists say" next to a snesational headline is the way forward. We can be very abstract and talk about the philosophy and theory of serious broadsheet journalism, but we have to remember that the paper that sells the most copies in the UK is The Sun, and for the last couple of months the most successful website has been that belonging to the Daily Mail.

Selling news has not traditionally been about dry facts and data. It has been about putting together a package that entertains as well as informs, in a way that attracts an audience of eyeballs for advertisers. It is that bit of the equation, the bundling of this content together, that is being unpicked by the availability of free information and entertainment on the Internet. Linked data doesn't make people consume and understand news. Entertaining stories driven by data might.

News organisations are often portrayed as dullards who missed the information revolution provided by the introduction of the World Wide Web. However, as we debate the relative merits of RDF and Microformats embedded in XHTML, we should also remember that the news industry has developed digital formats of its own. NewsML is 9 years old, and has provided the basis for the business of syndicating and exchanging news stories between companies for some time. The NLA's eClips service shows the industry collaborating to fund digital delivery. I think it clearly demonstrates a willingness and ability to wrap machine-readable information around news content - providing you can demonstrate that this isn't just about valid mark-up, it is also about having a valid business model behind it.

I don't, though, necessarily think we have always been strong as an industry in determining the right technological areas to compete in. James Cridland always argues that the success of radio as a medium is because all of the players in the market 'agree on technology, and then compete on content'. To an extent, newspapers do this in the printed sphere. Newspapers come in a set of standard sizes, which makes the manufacture and procurement of printing presses easier. It standardises the supply of paper, and makes it easier for the newsagent to have point-of-sale material of a uniform size and so forth. It also means that the advertising slots in print are standard - and we see that reflected online where we all carry a standard set of adverts. However, we have all also built or procured entirely different CMS systems, search engines, and do not have interoperable tagging or metadata. Online we seem to compete on content and publishing technology.

When people ask me why journalists are not very good at linking out to the rest of the web, I think we need to put a bit of context around the question. For a start, as an industry we've been shedding jobs left, right, and centre, which is putting more time pressure on the journalists that remain. Then there is the issue of links not being intrinsic to a print product. The workflow around many news organisations is still about getting the newspaper to the printers on time, or the radio bulletin on air with the right duration. Getting hyperlinks attached to stories is not a vital part of that workflow. The next time you are asking why journalists aren't adding sufficient metadata to their articles, consider whether you always comment your code as well as the next person who has to edit it would like...


Later this week I'll be posting about what I made of the rest of the London Linked Data meet-up.

Read more of my articles about Martin Belam talks and presentations and the future of news


jurnalisim was just a pain full shattered dream for me as the compition is huge and you need to have conections even if you are publishing online.


I think you have to compare the two sources of material when looking at the Guardian and Telegraph's publishing of PDFs. The Guardian had approved and pre-redacted documents (with little interesting information on them; even the duck house was blocked) and the Telegraph was working from utterly unpublishable uncensored receipts. I'm sure you don't imagine that the redacting process would have been a trivial matter while extracting the stories.

You make a fair point, and I certainly think after the release of the official documents The Telegraph provided a great online experience with the MPs Expenses data tool etc. What I was interested in getting across to an audience that on the whole believes that all data should be free, open and semantically tagged, is that there will sometimes be editorial and commercial reasons for keeping data within an organisation to exploit it in a particular way.

For example, our Tax Gap series similarly involved a lot of time spent with a fine-tooth comb looking at some rather dull figures, that were only made available (whilst judges allowed) after the paper had splashed with the story - rather than whilst the investigation was ongoing.

All the newspapers have to change the way they make their products. If you look the newspapers haven't changed in the last 200 years. It is time for something to change.

Keep up to date on my new blog