"Software developers and data journalists" - Daithí Ó Crualaoich talk at the Guardian

 by Martin Belam, 16 February 2011

We've been having a series of lunchtime talks in the Guardian about digital products and services, one of which was recently given by Daithí Ó Crualaoich, one of our developers. I've worked with Daithí on data-driven projects like the inclusion of MusicBrainz IDs and ISBNs in our Open Platform API. He has also worked on some of the high profile datajournalism projects that have appeared on guardian.co.uk in the last couple of years.

In his talk he was addressing the software development part of datajournalism, and I though he made some very salient points about the relationship between the two cultures of journalism and programming.

He reminded the audience that software devs are not journalists. They have general purpose skills with software that can be turned to any processing function, like the controls on a washing machine, but they generally, he said, have very limited skills in understanding what makes a story into “a story” in the way that journalists process information. This means that to take part in these kinds of projects, software developers have to adapt their general purpose skills to focus on journalism.

Or as he put it at one point with regard to technology:

“We have lots of hammers, so at least one of them will work on a screw”

He also made the point, as I've seen people like Heather Brooke make, that datajournalism isn't perhaps as special as is often made out. Daithí said:

“Understanding datajournalism is the same thing as understanding journalism. The stress is in the wrong place when we focus on the data. You can't give a machine data and get journalism out the other end”

He added that whilst we have software and computers and “stuff”, unfortunately someone still has to do the boring work of reading lots of documents and understanding them as a human. For example, with the Wikileaks diplomatic cables, it takes a person reading a selection of them to work out the angles of attack and lines of enquiry to follow.

Daithí used a great metaphor in saying that tracking down stories in data was like finding the needle in the haystack, and that the best way to do that was to burn the haystack down. The reason this works, he explained, is because in the real world the needle has different intrinsic properties to straw. In the digital world, to uncover the stories in data you look for the data with different intrinsic properties - the documents with the variations from the norm.

He also added that if we've learned one thing at the Guardian from the Afghan and Iraq war logs and the diplomatic cable leaks, it is that US soldiers are better at accurately tagging up data than diplomats. He said that the soldiers' application of metadata was generally so good that you suspect they'd make great sub-editors...

Keep up to date on my new blog