Computational & Data Journalism @ Cardiff

Reporting. Building. Designing. Informing

  • Blog
  • About Us
  • Local
  • Guests
You are here: Home / Archives for data

Finding donors to Truss leadership campaign, via Datasette

8th October 2022 by Aidan O'Donnell

Liz Truss now has the job of leading the Conservative Party and of running the country. So who gave her the money for her campaign?

MPs in Britain have to declare any money they receive, via donations or second jobs for example, and this is listed in the Register of Members’ Financial Interests.

The website explains that the Register exists to provide information about “any financial interest which a Member has, or any benefit which he or she receives” and the reason that this matters is that, according to the website, “others might reasonably consider” this money could influence the MP.

The Register is updated regularly but the data is laid out as text as a web page and only the line breaks serve to distinguish one field from another. This makes it very difficult to scrape the data or interrogate it for trends.

Datasette

This is where we turn to Datasette, a lovely tool for “exploring and publishing data”, built and maintained by Simon Willison and which allows SQL queries. Happily, there is already an example in place for the Register of Members’ Interests.

Querying

There are four main tables in this instance of Datasette: categories, items, members, and people. A query of the members table (using “View and edit SQL”) returns two ids we can use to look up Liz Truss in the main items table: a member number (40560) in the “id” column and a person number (24941) in the “person_id” column.

SELECT * FROM members WHERE name = ‘Elizabeth Truss’

The table with the crucial information, items, has close to two million entries. But, as Simon Willison explains, its members field seems to stop around 2015, so the person field is a better choice. Querying Truss in items via her “person_id”:

SELECT * FROM items WHERE person_id = ‘uk.org.publicwhip/person/24941’ ORDER BY date DESC

returns just over 950 entries, from 2010 to 2022.

But if you just want the 2022 donations:

SELECT * FROM items WHERE person_id = ‘uk.org.publicwhip/person/24941’ AND date LIKE ‘2022%’ ORDER BY date DESC

or more precisely again, just the donation descriptions that mention the word “campaign”:

SELECT * FROM items WHERE person_id = ‘uk.org.publicwhip/person/24941’ AND item LIKE ‘%campaign%’

This last query returns 48 donations, which you can then download as a csv or json from Datasette. Here is that data as a csv, after some further cleaning.

truss_donations2-1Download

Answers

Some initial observations on the donations are that:

£120,000 came from six companies: Big Bang Films, JC Bamford Excavators, Grolar Developments, SJJ Contracts, Smoked Salmon and Tungsten West. JC Bamford is the only one to have also donated to the wider Tory party in the last two years.

A little over £700,000 came from 13 people: Natasha Barnaba, Linda Edwards, Clara Freeman, Alison Frost, Fitriani Hay, Phillip Jeans, Gary Mond, Jon Moynihan, Sheila Noakes, Gordon Phillips, Howard Shore, Michael Spencer and Barbara Yerolemou.

A further £85,000 appeared to be help with transport, from Graham Edwards, Tony Gallagher, Greville Howard, Andrew Law and Nigel Vinson.

These initial observations however are just a starting point.

Filed Under: Blog Tagged With: data, politics, SQL

Journalism by Numbers — 2019 [Virtual] Summer School

26th June 2020 by Aidan O'Donnell

With Cardiff University buildings closed since March because of the Coronavirus pandemic, the Summer School for the public moved online in June, and included a one-hour session on what datajournalists do.

The Summer School comprised a week of workshops that ranged from radiography and earth sciences to building design and writing for business.

In our rapid run-through the data journalism world, we touched on classic go-to number stories like A&E waiting times and party-political donations as well as how journalists dig up the data in the first place (FOI, web scraping and so on). We looked at visuals done with colouring pencils, graphing cleaner air in Cardiff during lockdown and the ongoing questions around who keeps an eye on the algorithms.

People appeared online for our Journalism by Numbers workshop from around Wales and the UK, but also from Pakistan, Sweden and Nigeria.

Other workshops during the week covered ethics in Artificial Intelligence, copywriting and Google analytics. There was also a session on the ever-interesting Pharmabee project (which launched the Spot-a-bee app this year as part of their bee-mapping project).

Filed Under: Blog Tagged With: data, datajounalism, local, talks

SELECT * FROM a day of SQL…

6th March 2020 by Aidan O'Donnell

This month our students survived a full-day workshop on SQL, moving from the very basics of the syntax to querying datasets or working through some of the better tutorials.

First up was the excellent Select Star tutorial by Zi Chong Kao, which is based on a dataset of US prisoners executed since 1976.

We then looked for newslines in a sqlite database of US babynames (via the command line) and wrote queries in Carto to map a dataset of protected Welsh monuments.

There was more sqlite with a database of shooting incidents involving Dallas police officers, this time via a notebook. And we finished with the Knight Center’s fine SQL-based murder mystery.

Enough there to get you started (or refreshed) with your SQL syntax.

Filed Under: Blog Tagged With: coding, data, education, investigation, SQL, tools

Our Alumni: Nikita Vashisth – cutting edge in India

22nd June 2017 by Martin Chorley

In our new series of posts we’re taking a look at some of our past students, where they’ve gone, and what they’re up to. First up is Nikita Vashisth, one of the graduates from the first year of our course in 2014/2015.

Two years after leaving the course, Nikita is working with a cutting-edge Indian data journalism team. One of the projects she’s  involved in is to measure air particles to help save lives in cities affected by pollution – something she initially proposed in her major coursework. Nikita said:

“I’m working as a data journalist at IndiaSpend, India’s first data journalism initiative. One of the projects I am currently working on is #Breathe, an air quality monitoring network. We’re analyzing pm2.5 and pm10 levels across Indian cities to understand city-wide high and lows. The vision of the project is to democratize data critical to saving thousands of lives and engage citizens and other stakeholders in a conversation towards solving the life-threatening issue of air pollution.”

And her view of her studies? Nikita added:

 “Being a part of the first COMPJ batch in 2014 was a whirlwind! I was introduced to COMSCI and it opened a whole new world of opportunities in journalism for me. The course takes a practical learning approach in digital journalism, data analysis and coding—which made it all the more fun. The Visual Communication & Information Design and Digital Investigation modules were especially engaging and lead me to understanding the power of data and design in effective storytelling. A big shout out to Glyn and Martin, my course directors/lecturers/mentors whose immense support and knowledge helped me get past the nerve-wracking learning curve by the end of the year.”

We can’t wait to see what Nikita gets up to in the future, and look forward to seeing the projects she comes up with.

Filed Under: Teaching Tagged With: alumni, data

Chatbots in the Classroom: Education Innovation Research

7th June 2017 by Martin Chorley

The Computational and Data Journalism team has recently been awarded research funding from the University Centre for Education Innovation to investigate the use of chatbots in the classroom.

The project “proposes the development of chat bots as part of the teaching and learning team to support learning and automate everyday issues to alleviate staff workload.

“This would essentially create an on-demand classroom assistant who can provide informational support whatever schedule students choose to keep outside of the classroom environment and increase their overall satisfaction levels as a result.”

We’ve just hired a 3rd year Computer Science student, Stuart Clark to work with us on the project, and he has started swiftly, working to identify sources of data within the university that such a system can plug into, designing system architectures and interfaces, and beginning work on the implementation.

We’ll follow up this development work over the summer with a live trial of the system in Autumn to see how well it works and assess whether this sort of technology can be successfully used by students and lecturers alike to improve information flow and ease administrative pressures.

We’ll continue to blog about the project as it progresses over the next few months.

Filed Under: Blog, Research, Teaching, The Lab Tagged With: ai, chatbot, coding, data, education, education innovation, interaction, oss, students, summer project, tools

Visualising the Creative Industries in Cardiff: CUROP project

5th June 2017 by Martin Chorley

This summer, our team is running a project funded by the CUROP scheme here at Cardiff University. The Creative Cardiff team have collected a large amount of data on the creative industries in Cardiff, and are now looking for new ways to explore and communicate this data. Our summer project is aiming to do just that, bringing in an undergraduate student to gain some experience of the research environment, carry out some exploratory data analysis, and then design and implement visualisations to aid public understanding of the data.

 

Current mapping of Creative Cardiff data

 

We’ve just recruited our student, Samuel Jones, a first year student in the School of Computer Science and Informatics, and we’ll be getting started on the project soon. As we go, we’ll keep the site updated with progress, and point out the final outcomes once they’re released

Filed Under: Blog, Research Tagged With: creative cardiff, curop, data, map, student project, summer project, vis, visualisation

Hacking VoterPower with the Bureau Local

31st May 2017 by Martin Chorley

Today we hosted one of several hackdays happening nationwide, organised by The Bureau Local. Journalists from The Bristol Cable joined up with students from the MSc in Computational and Data Journalism to analyse election data, hoping to uncover local data stories around the voters in their local constituencies.

We’re pleased to be able to support one of the first community initiatives from The Bureau Local, which along with their project examining dark advertising on Facebook is beginning to show how they will deliver on their mission to build a “network of journalists and tech experts across the country who will work together to find and tell stories that matter to local communities”.

It was also great to meet up again with MSc Computational Journalism 14/15 grad Charles Boutaud, here representing the Bureau Local in his new role as a developer-journalist in their team.

Here's our team in Cardiff about to get stuck into some juicy datasets. We have teams in London, Bournemouth, Glasgow and Birmingham too pic.twitter.com/FFG6JMzurE

— The Bureau Local (@bureaulocal) May 31, 2017

.@bureaulocal is hacking #ge2017 live in 5 cities across the UK: London, Bournemouth, Cardiff, Birmingham and Glasgow! #voterpower pic.twitter.com/Nac7Gjtfo3

— Megan Lucero (@Megan_Lucero) May 31, 2017

We've been at @bureaulocal hack day in Cardiff, digging into #Bristol election data, part of nationwide network. #GE2017 https://t.co/hdRWIm9G6Z

— The Bristol Cable (@TheBristolCable) May 31, 2017

 

Filed Under: Blog Tagged With: bureaulocal, coding, collaboration, data, ge2017, grad, hack, hackday, local, voterpower

Digital Needles in the Data Haystack

24th May 2017 by Martin Chorley

We’re presenting today at “Investigating (with) Big Data“, a one day symposium being held at Cardiff University by the Digital Culture Network. Our talk, “Digital Needles in the Data Haystack” examines the use of data by news organisations, focusing on the challenges they face when carrying out investigations with increasingly large volumes of data. We discuss the collaborations that organisations have built to get past such problems, and talk about some of the issues surrounding the use of data within newsrooms.

It looks to be an interesting day of talks on a range of different topics connected to ‘Big Data’, and we’re looking forward to it!

Filed Under: Blog Tagged With: data, engagement, investigation, talks

Scraping the Assembly

2nd November 2016 by Martin Chorley

Glyn is currently teaching the first-semester module on Data Journalism. As part of this, students need to complete a data investigation project. One of the students is looking at the expenses of Welsh Assembly Members. These are all freely available online, but not in an easy to manipulate form. According to the Assembly they’d be happy to give the data out as a spreadsheet, if we submitted an FOI.

To me, this seems quite stupid. The information is all online and freely accessible. You’ve admitted you’re willing to give it out to anyone who submits an FOI. So why not just make the raw data available to download? This does not sound like a helpful Open Government to me. Anyway, for whatever reason, they’ve chosen not to, and we can’t be bothered to wait around for an FOI to come back. It’s much quicker and easier to build a scraper! We’ll just use selenium to drive a web browser, submit a search, page through all the results collecting the details, then dump it all out to csv. Simple.

Scraping the Assembly

I built this as a quick hack this morning. It took about an hour or so, and it shows. The code is not robust in any way, but it works. You can ask it for data from any year (or a number of years) and it’ll happily sit there churning its way through the results and spitting them out as both .csv and .json.

All the code is available on Github and it’s under an MIT Licence. Have fun 😉

Filed Under: Blog, Teaching Tagged With: coding, data, foi, investigation, oss, python, scraping

Updating Empty Properties: Agate vs Pandas

5th November 2015 by Martin Chorley

In the lab session this week we looked again at the Freedom of Information act and considered a request to Cardiff Council for the list of empty properties in Cardiff. Last year we did a very similar session, but this year I carried out the simple data analysis slightly differently.

[Read more…]

Filed Under: Blog, Teaching, The Lab Tagged With: agate, coding, data, foi, pandas, python, tools

Copyright © 2023 · News Pro Theme on Genesis Framework · WordPress · Log in