So after reading and writing about data journalism, it was time for me to actually immerse myself and become a data journalist (for a day at least).
To understand the role properly, I decided to go through the whole process i.e. find a topic, select, filter and build the data, visualise the data and see what stories emerge.
Stage-by-stage, this is how it went and what I learnt from it all…
STAGE 1: Finding the topic and selecting the data
The Royal Television Society (RTS) Television Journalism Awards took place in February, and as the winners were announced, the usual accompanying comments such as ‘this is the third time in a row that Jon Snow has won Presenter of the Year’ followed. But I realised this was pretty useless information out-of-context – I wanted a much more comprehensive, long term picture of regular winners than this. For example, over the lifespan of the awards, has BBC, ITN or Sky taken the majority of certain awards? I had found my topic.
Unfortunately there was no accompanying spreadsheet or data set, just a year-on-year list of winners and nominees on the website. It was not idea, but this would be my source of data. The next big hurdle was how to handle the data.
STAGE 2: Selecting, filtering and building the data
This was easily the most time consuming and logistically challenging element. I was faced with 14 sets of winners from 1996/1997 to 2009/2010 in a wide range of categories, not even collated into one document but spread over 14 different pages. The main problems with the raw data listed on the RTS website were:
- the layout was not uniform
- categories changed over the years
- details were often unclear
- the layout was not easy to search
The technicalities of handling data were being to come to light.
I chose to use a Google docs Spreadsheet as the tool to compile the data with. It is a flexible format, compatible friendly and great for sharing the data once its published.
Rather tediously I copied and pasted selected information across from the RTS website and into my data set. I created a new spreadsheet for each different category or topic in order to keep the data clean and separate.
Ideally I would have covered every category in detail, but this was an experiment so I selected a few areas to build the data around. I focused on 6 categories purely because of time constraints: Young Journalist of the Year; Current Affairs – Home; News Coverage – Home; Presenter of the Year; News Channel of the Year; Television Journalist of the Year.
How to arrange the data in the spreadsheet was
also a fairly complicated and very important issue, cropping up these problems:
- it has to be ‘visualisation friendly’ i.e. integrate easily into the tool through column headings etc.
- it has to be easy to understand for other users of the data set once I make it public
For the data set, please click here.
STAGE 3: Visualise the data
This is where I really struggled. I was learning/guessing as I went along, sometimes a little confused about exactly what was required to make the data work for a visualisation tool.
Here is what my data set uploaded to ManyEyes looks like. I had to delete it several times because visualising the data wouldn’t work. A process of trial and error followed, whereby I rearranged the Google Spreadsheet before uploading it again and again until it worked. Format and layout of the data is CRUCIAL I quickly found out.
After all the huffing and puffing, the final visualisation looked like this…
STAGE 4: What stories come out of the data?
Looking at the visualisation, a few stories immediately jump out:
- Sky News didn’t win a single award in the 6 categories until 2000/2001.
- 2005/2006 and 2009/2010 were the BBC’s worst years, winning only one award in total
- 2004/2005 was ITN’s biggest awards haul in the 6 categories, coming out on top above of BBC and Sky News
- ‘Other’ i.e. independents and anyone other than BBC, ITN or Sky News dominated in 1996/1997
The biggest lesson I learned is… handling data is a skill, and one that can reap rewards in the newsroom. With more work I could build a data set with much more information about the RTS Awards. Unfortunately my final visualisation was rather limited because I couldn’t manipulate ManyEyes accordingly – practice makes perfect!
If you think I could have done things diferently/better, then please let me know. Or have I missed any stories from the visualisation? Leave a comment on this post. Many thanks.
By Michael Greenfield (@mgreenfield13)