"The web data revolution - a new future for journalism": Datajournalism event at the Guardian
On Friday I managed to sneak into the Scott Room in Kings Place to see a little of an event about journalism and data that was being put on for Internet Week Europe. Dr. Aleks Krotoski was introducing Simon Rogers, David McCandless, Heather Brooke, Simon Jeffery and Richard Pope.
The panel delivered a lot of pithy quotes, Heather Brooke describing being an investigative reporter as basically being 'obsessively nosey', David Macandless said that 'the Internet pours into his eyes every day', and Simon Rogers quoted one of his journalism 'heroes' James Cameron. Speaking in 1969 he said:
"The new world will be a place of answers and no questions, because the only questions left will be answered by computers, because only computers will know what to ask."
McCandless talked about how showing his working allows 'a lot of angry people on the Internet' to keep him honest. As well as the final graphic, he usually publishes the raw data he has worked from and some early drafts, so people can see the process of making the visualisation. It was inevitable, he said, that some of his own bias or politics seeps into his data analysis.
He illustrated the point by showing diagrams that could put the US, China, Myanamar, North Korea, Saudia Arabia or Jordan at or near the top of being 'the biggest military power on earth', depending on whether you counted size of army or the proportion of the population involved, or raw expenditure or expenditure as a percentage of GDP.
Simon Rogers discussed how digital technology had changed the relationship between journalist and audience. Instead of just "chucking it out there" to a "grateful public", you realise that in most cases there will be experts in the audience who can often do the analysis of a particular data set better than you can - especially Doctor Who fans.
Simon Jeffery made a point about one thing that definitely hadn't changed. Every cell in the spreadsheets you publish has to be held to the same standard of accuracy as any fact you would publish in a story.
Another big theme of the session was about 'access' to data. Simon Rogers stated that datajournalism was nothing new, and showed charts that Florence Nightingale had drawn to make a point during the Crimean War. He said the difference now was that everybody had access to data and the tools to manipulate it - people had to understand that most datajournalism was about Excel, not about hardcore programming.
Heather Brooke also talked about the Freedom of Information act lowering the barrier to investigative journalism, and raising the standard of the data being collected, and Richard Pope described Scraperwiki as a tool to help people who wouldn't know where to start with coding unlock data that they can see is trapped on a webpage. He wanted to make the 'black art' of scraping the web more collaborative as well, so that a community could help fix things when they inevitably broke. Notably the call to action when you see their site listed in Google is 'a website where people can write and repair public web scrapers'.
Aleks asked how different this all made being a journalist these days. The general opinion seemed to be that this wouldn't be seen as a specialism in the future, but just the way journalism was done. Heather Brooke says she teaches her students "computer assisted reporting", but points out that nobody does a course in "telephone assisted reprting", that has just become "reporting".
I've spotted a couple of blog posts that featured some live note-taking of the event. Sarah Booker posted notes on Simon Rogers, Heather Brooke and David McCandless, whilst Nicola Hughs also has notes on Simon Jeffery and Richard Pope. I had to leave before the end, but I believe they finished the session by recording some or all of the Tech Weekly podcast, which I guess you will be able to hear in due course.
I also very much enjoyed the fact that with Simon Jeffery I witnessed a journalist use the phrase 'information architecture' in a sentence that didn't start, "So, what actually is..."