Building a data set and visualising it – a day in the life of a data journalist

So after reading and writing about data journalism, it was time for me to actually immerse myself and become a data journalist (for a day at least).

To understand the role properly, I decided to go through the whole process i.e. find a topic, select, filter and build the data, visualise the data and see what stories emerge.

Stage-by-stage, this is how it went and what I learnt from it all…

STAGE 1: Finding the topic and selecting the data

The Royal Television Society (RTS) Television Journalism Awards took place in February, and as the winners were announced, the usual accompanying comments such as ‘this is the third time in a row that Jon Snow has won Presenter of the Year’ followed.  But I realised this was pretty useless information out-of-context – I wanted a much more comprehensive, long term picture of regular winners than this.  For example, over the lifespan of the awards, has BBC, ITN or Sky taken the majority of certain awards?  I had found my topic.

Unfortunately there was no accompanying spreadsheet or data set, just a year-on-year list of winners and nominees on the website.  It was not idea, but this would be my source of data.  The next big hurdle was how to handle the data.

The Royal Television Society

STAGE 2: Selecting, filtering and building the data

This was easily the most time consuming and logistically challenging element.  I was faced with 14 sets of winners from 1996/1997 to 2009/2010 in a wide range of categories, not even collated into one document but spread over 14 different pages. The main problems with the raw data listed on the RTS website were:

  • the layout was not uniform
  • categories changed over the years
  • details were often unclear
  • the layout was not easy to search

The technicalities of handling data were being to come to light.

I chose to use a Google docs Spreadsheet as the tool to compile the data with.  It is a flexible format, compatible friendly and great for sharing the data once its published.

Rather tediously I copied and pasted selected information across from the RTS website and into my data set.  I created a new spreadsheet for each different category or topic in order to keep the data clean and separate.

Ideally I would have covered every category in detail, but this was an experiment so I selected a few areas to build the data around.  I focused on 6 categories purely because of time constraints: Young Journalist of the Year; Current Affairs – Home; News Coverage – Home; Presenter of the Year; News Channel of the Year; Television Journalist of the Year.

How to arrange the data in the spreadsheet was

also a fairly complicated and very important issue, cropping up these problems:

  • it has to be ‘visualisation friendly’ i.e. integrate easily into the tool through column headings etc.
  • it has to be easy to understand for other users of the data set once I make it public

For the data set, please click here.

STAGE 3: Visualise the data

With a small data set now compiled, I was ready to visualise it.  For this I used ManyEyes, often recommended as a good free tool for beginners by Paul Bradshaw.

This is where I really struggled.  I was learning/guessing as I went along, sometimes a little confused about exactly what was required to make the data work for a visualisation tool.

Here is what my data set uploaded to ManyEyes looks like.  I had to delete it several times because visualising the data wouldn’t work.  A process of trial and error followed, whereby I rearranged the Google Spreadsheet before uploading it again and again until it worked.  Format and layout of the data is CRUCIAL I quickly found out.

After all the huffing and puffing, the final visualisation looked like this…

STAGE 4: What stories come out of the data?

Looking at the visualisation, a few stories immediately jump out:

  1. Sky News didn’t win a single award in the 6 categories until 2000/2001.
  2. 2005/2006 and 2009/2010 were the BBC’s worst years, winning only one award in total
  3. 2004/2005 was ITN’s biggest awards haul in the 6 categories, coming out on top above of BBC and Sky News
  4. ‘Other’ i.e. independents and anyone other than BBC, ITN or Sky News dominated in 1996/1997

The biggest lesson I learned is… handling data is a skill, and one that can reap rewards in the newsroom.  With more work I could build a data set with much more information about the RTS Awards.  Unfortunately my final visualisation was rather limited because I couldn’t manipulate ManyEyes accordingly – practice makes perfect!

If you think I could have done things diferently/better, then please let me know. Or have I missed any stories from the visualisation? Leave a comment on this post. Many thanks.

By Michael Greenfield (@mgreenfield13)

This entry was posted in Data Journalism Experiment and tagged , , , , , , , , , . Bookmark the permalink.

2 Responses to Building a data set and visualising it – a day in the life of a data journalist

  1. Really good experiment – the RTS pages are a nightmare for any data journalist – very little structure and changing categories. As you say, the main area to keep working on is ManyEyes – I suspect another visualisation choice may be better, or switching your X and Y axes.
    This is what I got with a stack graph – which tells a clearer story (try shift-clicking to see more than one):

  2. James Glynn says:

    This is a nice post, it is a bit dense though maybe you could have some screengrabs to illustrate your points and break up the text.

    I’m also don’t think your visualisation was the best one. A 3D graph over time would be better.

    But, overall I learnt a lot and it was very informative.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s