The opportunity for citizen journalism is no more apparent than in data. The mainstream media outlets are effectively begging their readers to help them. With hundreds of thousands of documents to get through, more hands means less work. Open data encourages a collective effort, a sort of shared workload for a common good. As Paul Bradshaw comments, data is ‘a meeting point for journalists, developers and citizens.’
Have a look at this example:
‘We have 458,832 pages of documents. 27,563 of you have reviewed 222,996 of them. Only 235,836 to go…’
This is taken from the specialised ‘Investigate Your MP’s Expenses’ section of The Guardian website, accurate at 21.16 on 01/02/11.
They have enlisted their readers to help them, increasing interaction and reducing the need for extra staff, a win-win situation. By uploading all the MPs expenses documents, readers are asked to work their way through a section and judge each document they read on certain criteria. If the reader thinks they have found a juicy revelation, they simply click the ‘Investigate this!’ button and the professionals will pick it up and make a journalistic judgement. Simple really. Or is it? I gave it a try to find out…
This is the main page. I clicked ‘Start reviewing’ and away I went.
My first document was as above. Here is the decision panel on the right a little clearer..
If I was to do my job correctly, I had to decide firstly what the document was, then secondly whether it was interesting. Both trickier than expected. As you can see, I went with ‘Proof’ and ‘Not interesting.’ Making this decision was laboured as I couldn’t be completely happy with what it was due to the amount of redaction. I hit the all important button:
The text was far too small and pixelated on this one, and most of it redacted. My job was rendered pretty useless by such limiting factors.
Not exactly much information to go off in this one.
I had to Google ‘Viking Direct’ – turns out the company sells stationery. I then figured out that the item listed is ‘self seal windowed envelopes,’ 20 boxes of them. Fair enough for an MP I guess. I was genuinely concerned I might get it wrong, a certain journalistic pressure for accurate classification.
I skipped a couple of really boring documents that I didn’t want to waste my time trying to figure out, then…
This was my first expense claim rather than a proof document, prompting slight excitement. The claim is for £805.61, but the problem is deciding what for. Again the redaction of sensitive information makes my job as a junior data journalist pretty difficult.
The whole experiment was testing my patience so I came to a stop.
My conclusion: I made a very, very small dent in the workload and was left feeling frustrated and disappointed. The Guardian cannot rely on the slow and bitpart participation of the reader. The devil is in the detail, and I feel that not every casual contributor will analyse each document closely enough. The clarity of the picture is a major problem and as I often repeated, the level of redaction is too high.
This crowdsourcing experiment is an interesting one, but has many downfalls. The ‘what you’ve found so far’ page is revealing – the height of the discoveries is a scribbled note by Gordon Brown for £3,817.38.
Have a go and make your own mind up. I’m not convinced.
Michael Greenfield (@mgreenfield13)