So just what is data journalism?


The best thing about doing this blog on data journalism is that every group member has had the same initial reaction: what exactly is data journalism?  This reveals perhaps not how ignorant we all are, but rather the extent to which the term is still misunderstood among regular consumers of journalism.  It may well be a growing movement within the media bubble, but outside of it there is little recognition.

We all regularly consume different forms of journalism: online, TV, newspapers, radio or on our mobile phones.  We are increasingly exposed to examples of data journalism and yet unaware of it.

So let’s introduce to this post an example I have seen most often: the ‘word cloud’ or ‘tag cloud’ of political speeches.

This tool takes all the words from a speech, and the more often a word is used, the bigger it is in the clump of words.  It takes a text and visualises it to tell a story.  Take a look at how The Guardian made a word cloud comparison of 5 inauguration speeches of American Presidents.

So if we break down the stages of producing a word cloud in the simplest way possible, perhaps we can begin to understand what data journalism is.

First we have to access the raw data, which in this case is the script of the speech.  This is a public document, or an OPEN SOURCE.

Second we use a TOOL (software) that will translate the data.  Wordle is good in this instance.

Third and finally we can extract stories from the VISUAL display of the data we have produced. Here, the ‘word cloud’ tells a story about the focus of the speech.  In Barack Obama’s speech, the word ‘nation’ is the most visually prominent – this tells us a lot about his approach to the presidency.  This visualisation is then open to share so everyone can access it.

Immediately, we can see that data journalism is not necessarily just cold lists of numbers and statistics as the name suggests.  It is the creative use of all types of data to tell us something that isn’t detectable from the original body of data.

Wikipedia defines data journalism as ‘a journalistic process based on analyzing and filtering large data sets for the purpose of creating a new story.’  It ‘deals with open data that is freely available online and analyzed with open source tools.’

This is a good start and from the word cloud example we can see the various aspects of this definition in action.

But there are many more methodological and theoretical issues to discuss.  What about data consistency?  What about the filtering process?  What about tool (software) programming?  What about the creativity behind designing new formats of visualisations?  What about the intricacies involved in handling large amounts of raw data?

What does this mean for journalism as a profession?  Do you have to be a full-time data journalist or is it just another weapon in an ever-increasing arsenal of cross-platform journalistic skills?

Apologies for asking more questions than I am answering at this point but I think it is a valuable way of entering the topic.  Starting from a basic platform we can build up to establishing a thorough understanding and working knowledge of data journalism.

By Michael Greenfield (@mgreenfield13)

Advertisements
This entry was posted in Introduction to Data Journalism and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s