Computational & Data Journalism @ Cardiff

Reporting. Building. Designing. Informing

  • Blog
  • About Us
  • Local
  • Guests
You are here: Home / Archives for Aidan O'Donnell

Finding donors to Truss leadership campaign, via Datasette

8th October 2022 by Aidan O'Donnell

Liz Truss now has the job of leading the Conservative Party and of running the country. So who gave her the money for her campaign?

MPs in Britain have to declare any money they receive, via donations or second jobs for example, and this is listed in the Register of Members’ Financial Interests.

The website explains that the Register exists to provide information about “any financial interest which a Member has, or any benefit which he or she receives” and the reason that this matters is that, according to the website, “others might reasonably consider” this money could influence the MP.

The Register is updated regularly but the data is laid out as text as a web page and only the line breaks serve to distinguish one field from another. This makes it very difficult to scrape the data or interrogate it for trends.

Datasette

This is where we turn to Datasette, a lovely tool for “exploring and publishing data”, built and maintained by Simon Willison and which allows SQL queries. Happily, there is already an example in place for the Register of Members’ Interests.

Querying

There are four main tables in this instance of Datasette: categories, items, members, and people. A query of the members table (using “View and edit SQL”) returns two ids we can use to look up Liz Truss in the main items table: a member number (40560) in the “id” column and a person number (24941) in the “person_id” column.

SELECT * FROM members WHERE name = ‘Elizabeth Truss’

The table with the crucial information, items, has close to two million entries. But, as Simon Willison explains, its members field seems to stop around 2015, so the person field is a better choice. Querying Truss in items via her “person_id”:

SELECT * FROM items WHERE person_id = ‘uk.org.publicwhip/person/24941’ ORDER BY date DESC

returns just over 950 entries, from 2010 to 2022.

But if you just want the 2022 donations:

SELECT * FROM items WHERE person_id = ‘uk.org.publicwhip/person/24941’ AND date LIKE ‘2022%’ ORDER BY date DESC

or more precisely again, just the donation descriptions that mention the word “campaign”:

SELECT * FROM items WHERE person_id = ‘uk.org.publicwhip/person/24941’ AND item LIKE ‘%campaign%’

This last query returns 48 donations, which you can then download as a csv or json from Datasette. Here is that data as a csv, after some further cleaning.

truss_donations2-1Download

Answers

Some initial observations on the donations are that:

£120,000 came from six companies: Big Bang Films, JC Bamford Excavators, Grolar Developments, SJJ Contracts, Smoked Salmon and Tungsten West. JC Bamford is the only one to have also donated to the wider Tory party in the last two years.

A little over £700,000 came from 13 people: Natasha Barnaba, Linda Edwards, Clara Freeman, Alison Frost, Fitriani Hay, Phillip Jeans, Gary Mond, Jon Moynihan, Sheila Noakes, Gordon Phillips, Howard Shore, Michael Spencer and Barbara Yerolemou.

A further £85,000 appeared to be help with transport, from Graham Edwards, Tony Gallagher, Greville Howard, Andrew Law and Nigel Vinson.

These initial observations however are just a starting point.

Filed Under: Blog Tagged With: data, politics, SQL

Sharing Jupyter notebooks online

20th March 2021 by Aidan O'Donnell

You have a great Jupyter notebook you’ve been working on. If only you could share it with the world: here are some options for getting your notebook online.

If you just want to show a notebook to people without them running the code, nbviewer does the job by showing the cells and their output (beware long dataframes that won’t be cropped). Just put the notebook file (.ipynb) on github and supply the link to nbviewer. If your visitor likes what they see, they can immediately launch a functioning version via a Binder link, or download the .ipynb file. Here’s a simple example of what the user sees.

If your notebook is in a github repo you can skip nbviewer and build a working version of the notebook via https://mybinder.org/. Just supply the repository url and it will serve up all the .ipynb files, with the notebook cells ready to run.

jupyter{book} lets you build a complete book using notebook elements. Here’s an example with some notebooks.

The Voilà package “turns Jupyter notebooks into standalone web applications” or if you prefer, it puts only the cell output on the webpage. Where it gets really useful is by involving widgets from ipywidget to allow user interaction.

A github repo with Github Pages enabled can run as a webpage using a package called nbinteract but I’ve found it has trouble loading widgets, as seen in some of the tutorial pages.

Of course, Jupyter notebooks are not the only option: Kaggle, Google Colab and many more. There’s an episode of the podcast Talk Python To Me about a paper that reviewed 60 (!) different notebooks.

Filed Under: Blog, Research Tagged With: interaction, jupyter, notebooks, python, web dev

What we did in 12 weeks of data journalism

9th February 2021 by Aidan O'Donnell

Now that we’ve finished a first semester of data journalism work, I’ve put the module details online. It’s not an online course but it does have our Reading List, a running list of Interesting Datasets for practice (or work) and outlines of what we did each week in the Great Academic Year Of The Pandemic (some in great detail, others less so).

The course ran over 12 weeks and covered … “everything”.

If you want to see more, there are some catalogues online of what people are doing when they teach data journalism: this one started a few years ago by Dan Nguyen, this from the IJEC and this list by Jonathan Gray.

 

 

 

Filed Under: Blog Tagged With: data journalism, github

Some APIs for journalism

22nd November 2020 by Aidan O'Donnell

This month we find ourselves digging up data with the help of APIs. While there are oodles of APIs for different things (there’s a Star Wars API and an ISS API and many many others), I wondered which endpoints might be interesting for journalists. So here is a list of some of them — we’ll add to it as we find more — starting with government and moving on to business, health and … where you can charge your electric car.

* means an API key is required, ** means an API key plus extra authentication is required

Government

  • UK government APIs
  • Parliament APIs
  • UK election candidate data by Democracy Club
  • They work for you*
  • Parliamentary committees
  • Bristol Open Data hub
  • Historic Hansard

 

Covid, weather etc.

  • Covid data from UK government
  • UK Met office*
  • UK police
  • UK postcodes
  • Companies House*
  • Land registry*
  • Food hygiene
  • National Chargepoint Registry & Open Charge Map
  • Stats Wales
  • Open Corporates*
  • Facebook ad library**

 

US

  • US Federal Election Commission (FEC)*

 

Media

  • The Guardian*
  • Committee to Protect Journalists
  • NY Times*
  • Die Zeit*
  • US Press Freedom Tracker
  • Wikipedia page views
  • Twitter**

 

Filed Under: Blog Tagged With: api, journalism, JSON

Tim Harford’s lethal bathtub

15th September 2020 by Aidan O'Donnell

Tim Harford’s books are on the reading list for journalism students at Jomec and we are big fans of More or Less. And this month he supplied us all with a great case of numbers going wrong, in a piece for the Financial Times.

You can listen to him explaining it on Radio 4’s The World at One (segment at 17′ 52″).

The thinking — about how dangerous UK life is during the Covid pandemic — goes like this:

  • Every day in the UK about 40 people out of a million get the virus (ONS).
  • How dangerous is it if you’re one of the forty? If you’re aged 60, you have roughly a 1% chance of dying if you catch it.
  • 1% of ’40 in a million’ gets you to almost a 1 in 2 million chance of dying. So, if you are 60 and live in the UK at the moment (and are exposed to the typical risk in the UK) there’s a 1 in 2 million chance Covid will kill you.
  • Or make that a one in a million chance if you include ‘serious injury’ since another 1% of the ’40 in a million’ who catch it are left with health problems.

Everything, Tim Harford says, is fine up to here. But then he looked for other things that had a one in a million chance of death / serious injury. One of them, he explained to The World at One, was “taking a bath”.

“So when I discovered this I thought ‘oh, I wonder what else is about that risky?’ […] So when I wrote this all up for the Financial Times I just — as an afterthought, having worked so carefully to get all my Covid maths right — I just said ‘it’s a bit like riding a horse, riding a motorbike, going skiing, or taking a bath'”.

This is the error. The risk of dying in the bath is one in three million every year — not every time you take a bath. As Tim Harford remarks “Covid is no more risky than you thought. And taking a bath is much safer than you thought”.

Nonetheless, “That is the most shared thing I’ve ever said because it’s the most interesting thing I’ve ever said […] because it happens to be wrong”.

It is, as he observed, an instructive case of how mistakes happen and what newspeople pick up on.

His full account of it is on twitter.

 

Filed Under: Blog

Our course after 6 months of Covid-19

28th August 2020 by Aidan O'Donnell

The Covid-19 pandemic shut down our schools at the end of March and sent staff and students alike home to work on their laptops. This meant MSc students finished their group projects using online platforms and started dissertation projects while trying to get back to their home countries, or while stuck in Cardiff.

Although the Summer months are probably the right time to be stranded here when we get more sun than usual but less than in hotter parts of the world.

A new cohort of students will be arriving in Cardiff next month. Our course this year will run both online and in classrooms for the first semester. The computer science courses will be taught online, while most of the journalism work will take place in classrooms.

There is of course a huge amount of data and data-related stories that have been published in recent months because of the pandemic. And, it appears, the data and the effects of the pandemic on societies around the world will keep coming for a while yet. So it is a good time to be working on this kind of material.

And the Americans are planning an election, which should keep us busy in November and the weeks before.

 

Filed Under: Blog Tagged With: Cardiff University, Covid-19

The Clwstwr news projects — update

20th July 2020 by Aidan O'Donnell

Clwstwr is a five-year programme in south Wales — run from Cardiff — that was started to encourage the development of original screen-related projects. ‘Screen’ here means anything that involves creative or technological industries in a broad sense. Since it was set up in early 2019, it has allocated funding and development support to 23 different projects to allow for original research and development.

Many of the projects have been underway for close to a year at this stage (a full list of the projects is here) and a few of them are of particular interest to us since they are working on news:

Artificial Intelligence in the newsroom

This project is investigating how to put the resources of the deep web at the disposal of working journalists, by using artificial intelligence. It’s run by the Cardiff team of Amplyfi, a company that uses tech for business intelligence, and the project aims to  develop technology that will identify new entities that are emerging in the deep web, and especially new relationships between those entities.

Extracting court information for the press

The team behind the Caerphilly Observer are running this project, which will deal with court information (who’s appeared in court, who’s due to appear) that is often either unwieldy or downright inaccessible for journalists. The plan is to gather all this information for Magistrates Courts in Wales and make it available to journalists through a searchable database, which would greatly aid press coverage of local courts.

New ways of telling news stories

What’s the best way to tell a news story? This project is trying to answer this question by looking firstly at how people understand and response to stories in general, and then by designing new journalism techniques that will allow the press to tell stories in the most effective way possible. It’s a radical re-evaluation of a journalistic storytelling tradition that has long worked just on the basis of ‘that’s how we’ve always done it!’.

News in school

This project will design “a pilot for regular news service delivered to pupils within school hours”. The idea is that teachers can use this service to complement their teaching and that a new generation of young people will be introduced to the idea of staying informed.

Filed Under: Blog Tagged With: collaboration, creative cardiff, engagement, local, screen

Journalism by Numbers — 2019 [Virtual] Summer School

26th June 2020 by Aidan O'Donnell

With Cardiff University buildings closed since March because of the Coronavirus pandemic, the Summer School for the public moved online in June, and included a one-hour session on what datajournalists do.

The Summer School comprised a week of workshops that ranged from radiography and earth sciences to building design and writing for business.

In our rapid run-through the data journalism world, we touched on classic go-to number stories like A&E waiting times and party-political donations as well as how journalists dig up the data in the first place (FOI, web scraping and so on). We looked at visuals done with colouring pencils, graphing cleaner air in Cardiff during lockdown and the ongoing questions around who keeps an eye on the algorithms.

People appeared online for our Journalism by Numbers workshop from around Wales and the UK, but also from Pakistan, Sweden and Nigeria.

Other workshops during the week covered ethics in Artificial Intelligence, copywriting and Google analytics. There was also a session on the ever-interesting Pharmabee project (which launched the Spot-a-bee app this year as part of their bee-mapping project).

Filed Under: Blog Tagged With: data, datajounalism, local, talks

Capturing OSINT flags with Cardiff’s Cybersoc

3rd May 2020 by Aidan O'Donnell

Cardiff University’s Cyber Society gave us all its Capture the Flag challenge earlier this year and now has over a thousand players on its leaderboard, many of them sitting on the maximum score 0f 15,000.

The challenges are organised into three streams: ten introductory questions to get you warmed up, 18 tasks for online intelligence gathering and finally a dozen challenges centred around some fictional characters and their online life.

There are no pre-requisites for attempting it — it starts with a “What is OSINT?” question, so beginners are welcome — but it should test most players’ “resilience” (i.e. can you keep playing even though you’ve run out of ideas, patience and any sense that you once knew anything about online intelligence gathering?). At least one of our Computational Journalism students has made it successfully through all the challenges.

The challenges were featured by We Are OSINTCurious on its webcast in March.

Filed Under: Blog Tagged With: education, investigation, OSINT, students

SELECT * FROM a day of SQL…

6th March 2020 by Aidan O'Donnell

This month our students survived a full-day workshop on SQL, moving from the very basics of the syntax to querying datasets or working through some of the better tutorials.

First up was the excellent Select Star tutorial by Zi Chong Kao, which is based on a dataset of US prisoners executed since 1976.

We then looked for newslines in a sqlite database of US babynames (via the command line) and wrote queries in Carto to map a dataset of protected Welsh monuments.

There was more sqlite with a database of shooting incidents involving Dallas police officers, this time via a notebook. And we finished with the Knight Center’s fine SQL-based murder mystery.

Enough there to get you started (or refreshed) with your SQL syntax.

Filed Under: Blog Tagged With: coding, data, education, investigation, SQL, tools

  • 1
  • 2
  • Next Page »

Copyright © 2023 · News Pro Theme on Genesis Framework · WordPress · Log in