'Linked Data and the future of journalism' - part 1
Last week I appeared on a panel chaired by Paul Bradshaw at the London Linked Data meet-up, discussing the future of data and journalism with John O'Donovan, Dan Brickley and Leigh Dodds. I made a very rough audio recoding of the session, so I've been able to put together a transcript of what I said in response to a wide-ranging set of questions. I should add that, as ever, these comments express my own views, and are not the views of Guardian News & Media.
Martin Belam on 'Linked Data and the future of journalism'
One of the things that is interesting to me is that as a lot of papers are preparing to put paywalls up and shutter their content behind them, The Guardian has done exactly the opposite by building the Open Platform, and allowing anybody with an API key to re-use our content. You can get the full text of our website archive marked-up in XML, JSON and ATOM, and there are pre-existing client libraries for Python, Ruby, PHP, Java and Perl amongst others.
There is a real tension there between that, and organisations that are wanting to charge money for news. You could conclude that between ourselves and News International, one of us must be wrong. However, for a start, The Guardian is structured differently from other national newspapers. It is owned by a trust whose mission is to protect the paper's journalism, and the business is 'profit-seeking' rather than having to be 'profit-making'.
Matt McAlister gives an example of where the Open Platform fits into this, suggesting that, in the future, two bloggers in Brazil having an argument about the carbon footprint of a company can come to The Guardian's Data Store to get their evidence, republish some articles to support their positions, and all of it will be branded and attributed to The Guardian, and we need never know that it has happened.
That does mean that we are very conscious now of how we structure and mark-up our information. For example, if I create a new section on guardian.co.uk called "ZZZ - Martin's test site - don't publish here" and I don't distribute the URL, then it should stay relatively private. Well, until Google picked it up from our sitemaps I suppose. However, my sandbox would certainly be instantly exposed to anyone searching through the API. As we are building products and services, we often stop and think "OK, how is this content or data structure going to look when it appears in the API?".
News organisations have generally relied on one business model, advertising. I think now we will all have to move to a more hybrid set of business models, and The Guardian sees potential to drive not just reach and influence, but revenue through distributing content in this way, though it would just be one strand of potential revenue. I rather like this analogy from David Cohn: news organisations have just jumped off the Titanic as it sinks. In the water there is a lot of loose timber. Clinging to one piece of timber - or one business model - will not save you. However, lashing together several bits of wood to make a life-raft might.
When we talk about 'Linked Data' and journalism I think there are two elements. Firstly, how do you publish the end product - the news - in machine readable formats that allow for re-use and semantic understanding of content. But secondly, there is how 'Linked Data' can be utilised as a research tool to help professional and non-professional journalists unearth and tell stories.
I did a presentation at the News Innovation Unconference where I advocated 'journalism centred design'. Not 'journalist centred design' I hasten to add, that would simply be a big photo of the journalist and a byline bigger than the headlines.
I believe we can learn from the industrial methodology of time and motion studies on production lines to develop 'lean manufacturing'. The requirements of a journalist working on the film or music review desk, and those working on the financial or foreign news desk are quite different, yet almost all news organisations will sit them down in front of exactly the same CMS screen. For me, the key to making 'Linked Data' work for journalism is about building tools that allows people to exploit this data when they are researching stories, it shouldn't just be about adding some fancy metadata to news websites.
Paul Bradshaw wrote up an excellent overview of the session on the Online Journalism Blog, which includes a video clip of some of it. Tomorrow, on currybetdotnet, I'll be continuing my rough transcript of what I said at the event.