Computational & Data Journalism @ Cardiff

Reporting. Building. Designing. Informing

  • Blog
  • About Us
  • Local
  • Guests
You are here: Home / Archives for scraping

Election data — the UK’s December vote

13th December 2019 by Aidan O'Donnell

Elections are a special meeting of journalism and data. They generate lots of both! So the morning after the long night of vote counting, we got this year’s students working on the results for a full day. Four student groups were each given one of the four UK nations. Each group also got a Welsh constituency to analyse; after consulting with our pol corrs on the MA-News programme we decided the interesting Welsh battles would be in Cardiff North, Ceredigion, Caerphilly and Vale of Glamorgan.

The main difficulty with analysing the results was not having the XML feed from PA that UK news organisations had been relying on (and had been testing for weeks). We didn’t have the raw data flowing in as soon as a count was announced. But that’s where the BBC came in — they published results for each constituency in a standardised url, supplying 650 webpages for the UK’s 650 constituencies.

This meant that it was enough to draw up a few lines to grab each page and the corresponding batch of results. If only all large-scale scraping was as clean and consistent!

We were able to publish csv files with full results for the four nations by the end of the day. Now of course you can get them from lots of sources but right after the election, and with results still being declared throughout Friday, we were able to get started on analysing the results once we had these tables.

The people at Flourish provided very helpful templates ahead of the vote. So hex maps, animated bar charts and Sankey diagrams were all ready and waiting for numbers.

Filed Under: Blog Tagged With: hackday, politics, scraping, students, voterpower

Scraping the Assembly

2nd November 2016 by Martin Chorley

Glyn is currently teaching the first-semester module on Data Journalism. As part of this, students need to complete a data investigation project. One of the students is looking at the expenses of Welsh Assembly Members. These are all freely available online, but not in an easy to manipulate form. According to the Assembly they’d be happy to give the data out as a spreadsheet, if we submitted an FOI.

To me, this seems quite stupid. The information is all online and freely accessible. You’ve admitted you’re willing to give it out to anyone who submits an FOI. So why not just make the raw data available to download? This does not sound like a helpful Open Government to me. Anyway, for whatever reason, they’ve chosen not to, and we can’t be bothered to wait around for an FOI to come back. It’s much quicker and easier to build a scraper! We’ll just use selenium to drive a web browser, submit a search, page through all the results collecting the details, then dump it all out to csv. Simple.

Scraping the Assembly

I built this as a quick hack this morning. It took about an hour or so, and it shows. The code is not robust in any way, but it works. You can ask it for data from any year (or a number of years) and it’ll happily sit there churning its way through the results and spitting them out as both .csv and .json.

All the code is available on Github and it’s under an MIT Licence. Have fun 😉

Filed Under: Blog, Teaching Tagged With: coding, data, foi, investigation, oss, python, scraping

Copyright © 2021 · News Pro Theme on Genesis Framework · WordPress · Log in