How to scrape data?

Scraping data is such a useful tool as it means that you can grab bits of information from various websites and it is in a form that you can easily analyse.

Paul Bradshaw, Data journalist, said in a article with Journalism.co.uk that:

 “I love scraping: it is both a great time-saver, and a great source of stories no one else has.”

So Data Queen has decided to experienced for her self how easy it is to scrape. I scraped a few lines from parliamentary website and found out that it is not only easy but it is also incredibly useful.

It was a joy to use and I would most certainly use it again. So I have to share m,y new found knowledge with you and here goes.

I decided to look at the financial interests and payments made to MPs. These payments are made from third parties for the time it may take for an MP to do a talk or to attend an event where there experience and expertise is needed.

I started with MP Diane Abbot.

First I copied and pasted the URL into outwit hub.

I used Outwit hub as it was very easy and not a difficult piece of software to use and scrape data from.

Image

Then I looked for the information I wanted. So I wanted to scrape her name as well as the amount of money she was being paid for a particular event.

So I started a new scrape, and than I added in a marker before and a marker after. Now the marker that you put in depends on the information you want to scrape. For example once you have copied and pasted the link you can then scroll down the page and you are able to look for the marker that you require.

If you look before the persons name you will than be able to see the marker you need. In my case the marker before was </div> and the marker after was </h2>  (please see image below).

Image

Once I had the marker before and after in I was able to save the first scrape and execute. Once you have done this you can export the data to an excel format.

Image

Below you can see the events that Diane Abbot went to and the payments she received for them. All good.  Diane Abbott we love you.

Image

Reflection

I enjoyed this, however there were times that it was quite a difficult process. Sometimes it involved some trial and error when choosing the correct marker before and after.

However it provided some  useful information and I would highly recommend people use.

What I would recommend is perhaps try and do this on a few different websites. Do not chose a complicated one perhaps like the one I chose as it makes it more difficult to find the marker before and after.

Thanks for reading!

Advertisements

What can data journalism do for you?

The Queen of all data pictures...

The Queen of all data pictures…picture courtesy of flicker creative commons

Part of my four week series on data journalism and passing the baton of knowledge on is to look at how to get the best tools. I thought it would be useful to look at what data journalism can do for a young journalist or anyone in the profession. I think this would have been a better question to start with prior to looking at the types of data that you can obtain.

……So better late than never and here goes.

I stumbled across this brilliant website called multimedia journalism. It looks at the usefulness of data journalism and how as a journalist you can paint a brilliant picture for your readers when you allow the data to enhance your piece, but not to dominate or control it.

Here are some of the useful ways data is enhancing journalism, from Mirko Lorenz, as quoted at a Data-driven Journalism conference.

1.Data is used increasingly to visualise very complex issues. The use of infographics and videos have been enhancing data journalism and providing very compelling visuals for readers.

2.The publication of the Afghanistan war logs by the New York Times, Der Spiegel and The Guardian have raised awareness that better use of data might lead to very big stories.

3. Similarly, the way The Guardian handled the expenses scandal of British MPs in 2009 has sparked interest in various elements [that] might be involved in this.

4. Think crowdsourcing. Think opening up large stores of public data and turning it into open data that everyone can share.

5.Think uncovering scandals and being able to prove it with numbers.

6.Think providing people with dependable services, helping them to decide when buying, insuring, participating or making life choices.

7.To do that journalists will have to learn new tricks. They have to get used to working with tools that will help them to make data flow.

The site also discusses some of the stories that have used data journalism and how it enhanced the story -telling. The Guardian has had some compelling stories which have used data:

1.Afghanistan War Logs

Special report page, providing information from many angles, based on the leaked documents published by Wikileaks http://www.guardian.co.uk/world/the-war-logs

2.Investigate your MP’s expenses

An innovative crowdsourcing application allowing users to check 458,832 documents, adding indications whether the documents should be investigated further or not http://mps-expenses.guardian.co.uk/

MP Expenses: Who claimed what? The full list

Including an open spreadsheet for every MP http://www.guardian.co.uk/news/datablog/2010/feb/04/mps-expenses-claims-full-list

Since using data within my journalism, I have found that it is all about the data enhancing the piece and not dominating. Many pieces within print journalism, will list reams of statistics or numbers, but will place no substance behind what they mean or how they help with the story that they are trying to tell the reader.

I have come to the central conclusion that data is powerful, if used to enhance the story and not as just a list of numbers.

If you want to find out more about how data can help you-why not check out the website for yourself-

http://www.multimedia-journalism.co.uk/node/1272

Thanks for reading!

How to get the best data for journalism-Passing the baton of knowledge on.

baton

 

Claire Miller’s link on FOI (click here)

Having started data journalism in September 2013, I never really realised how powerful data could be and the potential it has in conveying a very powerful message to its audience.

It got me thinking about all the wonderful ways in which we can obtain data, over the next four weeks I am going to post a series of information links showing young an aspiring data hounds how to sniff out the best bits and get the most “bang for your buck,” i.e have powerful front pages.

I thought I would start with a blog post I read from Claire Miller, from Trinity Mirror Group Regional newspapers,  on Freedom of Information (FOI).

Claire explains the FOI requests that she currently has on the go, and also how to request the best information.

This is a great source of information for all of those inquisitive data driven journalists. It also relates to the initial FOI request I did at the start of my data journey. It has subsequently helped me massively with all my new FOI requests and I hope you find it useful.

http://clairemiller.net/blog/2014/01/with-foi-think-mvp/

Thanks for reading!

Link

Boris Vs Bob: Infographic on Twitter reactions to tube strike

Who won the Tube strike twitter prize? Picture courtsey of flicker creative commons

Who won the Tube strike Twitter prize? Picture courtesy of flickr creative commons

Infographic on Twitter reactions to tube strike

Great bit of #datajournalism on the Twitter reactions to the tube strike

6,477 people tweeting about Boris vs. 4,360 tweeting about Bob

This piece is presented in an easy, factual way. That is clear the numbers and visuals give you a great picture about the strike and help the reader to understand what some of the issues are.

Link

Prevalence of female genital mutilation in Africa and Yemen (women aged 15 – 49)

 

Heat Map to show prevalence of FGM in Africa

Heat Map to show prevalence of FGM in Africa

 

This Map shows the prevalence of FGM in Africa and the Yemen. The Map was very easy to make. Once I had the Data from the World Health Organisation I was than able to create a map showing the prevalence of FGM.

This is a very useful way to show the prevalence of a condition.

Link

Syria: The most dangerous country in the world to be a journalist (interactive graphics)

Syria: The most dangerous country in the world to be a journalist (interactive graphics)

Very interesting piece of data journalism on the worst country to be a journalist by Rachel Banning-Lover

This piece of data journalism really highlights the number of journalists who are dying in Syria compared to other countries. Great use of statistics and highlights an important issue on danger of conflict reporting in Syria at the moment.

 

Link

Incorrect prescriptions in Wales over the last 5 years

This data is based on a Freedom of Information request (FOI) I did in October 2013. When requesting for a FOI, one of the key things I learnt was that you need to set our your parameters. So you need to clearly state to the FOI administrator what it is you want, over which period and the type of format that should be sent to you, i.e. an excel document or word.

I requested data the data from NHS England, immediately they came back with questions on the format I wanted.

Lesson 1:

Always let the FOI department have all the information

Lesson 2:

Make sure you follow up on FOI requests. There is a time frame that they have to respond, so ensure you remember when you sent it and when you expect a response by.

Lesson 3

Check websites such as What Do They Know. You can browse and make FOI requests which is brilliant.

Lesson 4

Do a few of them, you never know what you may uncover

% increase in incorrect prescriptions in Wales over the last 5 years

Incorrect prescriptions in Wales over the last 5 years from 2008-2013