27 May 2011

I recently appeared as part of a panel session at FutureEverything talking about data journalism. I’ve already blogged the four points I was planning to make. Here are my notes from the talks given by those I was sharing the stage with.

Chris Taggart

Chris made the point he was no longer clear what the difference was between journalism and data - the numbers often are the stories. To illustrate his point he demonstrated the site he has developed, Openly Local, which gathers and re-publishes spending data local councils in the UK.

On the train to FutureEverything he had taken a quick look at the spending of Manchester’s city council, and, as he put it backstage, the rest of his talk just wrote itself. Firstly he had identified that one supplier was the beneficiary of £15m. “Once upon a time”, he said, “these would be stories, now they are just database queries.”

He had also identified a large amount of money where the supplier details were “Redacted personal data”. There is good reason to have that in the system, as you would not want the state to be revealing personal details of foster parents or vulnerable people receiving care payments. However, Chris had immediately come across payments for £150,000 and £97,000. This seemed a potential misuse of the way the information is supposed to be redacted, and definitely worth investigating. It would be interesting to find out if anyone followed up his findings.

In my talk I mentioned Michael Blastland, and his statistical approach to using numbers to tell stories. Chris Taggart echoed this, by stressing that statisticians try to exclude outposts in the data, whereas journalists use outliers to spot stories.

David Higgerson

David opened by saying that he sometimes hears that “open data” is a threat to journalism, as it takes away one of the vital fact-seeking components of the job. This argument misses the point, he said: “People have always been able to find out about things”. The role of a news organisation, particularly at local level has always been to tell people interesting stuff they might not have known about. Making more raw data available in public wasn’t going to fundamentally change that.

He also identified the risk of a “look at me” culture, saying that whilst putting data on a chart could be fun, you had to make sure that you were still telling a story that was useful to the reader.

David criticised those who say that data journalism is “easy journalism” because “the charts update themselves”. He argued that you shouldn’t underestimate the complexity in processing data before it can be publish. “If you thought working with PDFs is hard”, he said, “you should try dealing with a council who issue their data as JPG images”.

Paul Bradshaw

Opening his talk with a quote about how technology has always been seen as a threat to newspapers, Paul explained that people thought the invention of the telegraph would destroy them, as people would now be able to find out “the truth”.

He feels it is easy to get distracted by the fact that there are a myriad of data processing tools out there now that journalists can use, as if they will do the job of story-telling by themselves. The sheer scale of the data available was making computers a vital part of the process. Paul pointed out that the pertinent Watergate documents were about 2 million words, whereas the recent Wikileaks data dumps were more in the region of 200 million words.

Paul also added that it was a misconception to think of “data” as just numbers. In a digital world, he said, text, audio and video are all rendered as zeroes and ones, and so you can use the same interrogation techniques on them.

He ended with a call for openness in publishing that I wholeheartedly agree with. He said that in order to hold power to account in a connected digital world, it was a journalistic responsibility to make data searchable, findable and linkable.

Data lacks context. Journalism, at least good journalism, puts that data into a human context that makes it meaningful. Until computers get a whole lot smarter then they are today, data will not be replacing journalism any time soon.

