Last week Steve Lohr and Andy Lehren from the New York Times came in to talk about data journalism. Given how amazing that lecture was, I thought you might want more. For more of Andy, you can watch this interview of Andy talking about investigative journalism. For more about data journalism you should check out the Guardian’s data store.
Data journalism is not new, just easier…
Supplementing stories with data is at the core of many investigative journalism efforts. Tables are popular, and they’ve been around for a while. As Andy pointed out, the NYTimes brought down Boss Tweed by publishing a simple table. They acquired and published the ledgers from the comptroller’s office. These documents showed that money slated to go into the construction of a new courthouse was being siphoned off to Tweed’s cronies. The public was, of course, outraged.
The first piece of data journalism by the Guardian came out in 1821. For context, this was 40 years before the Civil War, James Monroe had just been reelected as president, and we were on less than great terms with England after they burned down our capitol. The Guardian provided a table breaking down schools by how many children attended the school and how much that school cost. While riddled with data collection errors, it was still a better analysis than anyone had done before. It showed that there were far more students getting a free education than previously thought, which also meant that poverty affecting children was far higher than previously thought.
Data journalism is not new. What’s different now is that it’s easier. The barrier to entry has been reduced, both in terms of gathering and analyzing data. Spreadsheets make simple analysis easier, and the internet is brimming with data sources. Simon Rogers describes the change to journalism in the following way:
But now statistics have become democratised, no longer the preserve of the few but of everyone who has a spreadsheet package on their laptop, desktop or even their mobile and tablet. Anyone can take on a fearsome set of data now and wrangle it into shape. Of course, they may not be right, but now you can easily find someone to help you. We are not wandering alone any more.
..but it’s still journalism
In the same article, Rogers also explains that data journalism is still journalism. He says, “Data journalism is not graphics and visualisations. It’s about telling the story in the best way possible.” The key deliverable is the story. In some cases the story will need a bar chart or a map or some sort of detailed infographic. In others, it may require a single number or simply just prose.
As data scientists, we tend to fetishize the power of data. It seeps into how we view the world, specifically in how we think people should do their jobs. This world view has led to some backlash from the journalism community. Many data scientists do not understand that the goal of journalism is to write a interesting, compelling, and accurate story. The notion that everything can be counted, that data is power is, in fact, false. As Jonathan Gray from the Open Data Foundation expands on this point: “The value that data can potentially deliver to society is to be realised by human beings who use data to do useful things.”
Data is a means to an end.
With that in mind students, I’d like you to scour the web a good data journalism article. Describe what makes it good from both a data scientist point of view and from a journalist point of view.