Computational & Data Journalism @ Cardiff

Reporting. Building. Designing. Informing

  • Blog
  • About Us
  • Local
  • Guests
You are here: Home / Archives for compj

The Importance of Owning your Toolset

19th November 2014 by Martin Chorley

This morning, upon logging in to ScraperWiki, Glyn found the following message:

ScraperWiki loses access to Twitter API

ScraperWiki loses access to Twitter API

 

ScraperWiki can no longer access the Twitter API, meaning that tools based on Twitter data on ScraperWiki will no longer work. As it happens, I have just written a really simple Twitter API wrapper in Python, so we thought it would be worth presenting it here as a how-to. If you understand and own the code that is gathering your data, you can obviously remove your reliance on third parties.

 

QUICK DISCLAIMER: this is a quick and dirty solution to a problem, so may not represent best coding practice, and has absolutely no error checking or handling. Use with caution…

The code presented here will allow you to make any API request to Twitter that uses a GET request, so is really only useful for getting data *from* Twitter, not sending it *to* Twitter. It is also only for using with the REST API, not the streaming API, so if you’re looking for realtime monitoring, this is not the API wrapper you’re looking for. This API wrapper also uses a single user’s authentication (yours), so is not setup to allow other users to use Twitter through your application.

The first step is to get some access credentials from Twitter. Head over to https://apps.twitter.com/ and register a new application. Once the application is created, you’ll be able to access its details. Under ‘Keys and Access Tokens’ are four values we’re going to need for the API – the  Consumer Key and Consumer Secret, and the Access Token and Access Token Secret. Copy all four values into a new python file, and save it as ‘_credentials.py‘. The images below walk through the process. Also – don’t try and use the credentials from these images, this app has already been deleted so they won’t work!

Create a new Twitter Application
Create a new Twitter Application
Give it a name, description and website
Give it a name, description and website
Check the App details
Check the App details

Get your Consumer Key and Secret
Get your Consumer Key and Secret
... and your Access Token and Secret
… and your Access Token and Secret
Store your credentials in _credentials.py
Store your credentials in _credentials.py

Once we have the credentials, we can write some code to make some API requests.

First, we define a Twitter API object that will carry out our API requests. We need to store the API url, and some details to allow us to throttle our requests to Twitter to fit inside their rate limiting.

class Twitter_API:

 def __init__(self):

   # URL for accessing API
   scheme = "https://"
   api_url = "api.twitter.com"
   version = "1.1"

   self.api_base = scheme + api_url + "/" + version

   #
   # seconds between queries to each endpoint
   # queries in this project limited to 180 per 15 minutes
   query_interval = float(15 * 60)/(175)

   #
   # rate limiting timer
   self.__monitor = {'wait':query_interval,
     'earliest':None,
     'timer':None}

We add a rate limiting method that will make our API sleep if we are requesting things from Twitter too fast:

 #
 # rate_controller puts the thread to sleep 
 # if we're hitting the API too fast
 def __rate_controller(self, monitor_dict):

   # 
   # join the timer thread
   if monitor_dict['timer'] is not None:
   monitor_dict['timer'].join() 

   # sleep if necessary 
   while time.time() < monitor_dict['earliest']:
     time.sleep(monitor_dict['earliest'] - time.time())
 
   # work out then the next API call can be made
   earliest = time.time() + monitor_dict['wait']
   timer = threading.Timer( earliest-time.time(), lambda: None )
   monitor_dict['earliest'] = earliest
   monitor_dict['timer'] = timer
   monitor_dict['timer'].start()

The Twitter API requires us to supply authentication headers in the request. One of these headers is a signature, created by encoding details of the request. We can write a function that will take in all the details of the request (method, url, parameters) and create the signature:

 # 
 # make the signature for the API request
 def get_signature(self, method, url, params):
 
   # escape special characters in all parameter keys
   encoded_params = {}
   for k, v in params.items():
     encoded_k = urllib.parse.quote_plus(str(k))
     encoded_v = urllib.parse.quote_plus(str(v))
     encoded_params[encoded_k] = encoded_v 

   # sort the parameters alphabetically by key
   sorted_keys = sorted(encoded_params.keys())

   # create a string from the parameters
   signing_string = ""

   count = 0
   for key in sorted_keys:
     signing_string += key
     signing_string += "="
     signing_string += encoded_params[key]
     count += 1
     if count < len(sorted_keys):
       signing_string += "&"

   # construct the base string
   base_string = method.upper()
   base_string += "&"
   base_string += urllib.parse.quote_plus(url)
   base_string += "&"
   base_string += urllib.parse.quote_plus(signing_string)

   # construct the key
   signing_key = urllib.parse.quote_plus(client_secret) + "&" + urllib.parse.quote_plus(access_secret)

   # encrypt the base string with the key, and base64 encode the result
   hashed = hmac.new(signing_key.encode(), base_string.encode(), sha1)
   signature = base64.b64encode(hashed.digest())
   return signature.decode("utf-8")

Finally, we can write a method to actually *make* the API request:

 def query_get(self, endpoint, aspect, get_params={}):
 
   #
   # rate limiting
   self.__rate_controller(self.__monitor)

   # ensure we're dealing with strings as parameters
   str_param_data = {}
   for k, v in get_params.items():
     str_param_data[str(k)] = str(v)

   # construct the query url
   url = self.api_base + "/" + endpoint + "/" + aspect + ".json"
 
   # add the header parameters for authorisation
   header_parameters = {
     "oauth_consumer_key": client_id,
     "oauth_nonce": uuid.uuid4(),
     "oauth_signature_method": "HMAC-SHA1",
     "oauth_timestamp": time.time(),
     "oauth_token": access_token,
     "oauth_version": 1.0
   }

   # collect all the parameters together for creating the signature
   signing_parameters = {}
   for k, v in header_parameters.items():
     signing_parameters[k] = v
   for k, v in str_param_data.items():
     signing_parameters[k] = v

   # create the signature and add it to the header parameters
   header_parameters["oauth_signature"] = self.get_signature("GET", url, signing_parameters)

   # add the OAuth headers
   header_string = "OAuth "
   count = 0
   for k, v in header_parameters.items():
     header_string += urllib.parse.quote_plus(str(k))
     header_string += "=\""
     header_string += urllib.parse.quote_plus(str(v))
     header_string += "\""
     count += 1
     if count < 7:
       header_string += ", "

   headers = {
     "Authorization": header_string
   }

   # create the full url including parameters
   url = url + "?" + urllib.parse.urlencode(str_param_data)
   request = urllib.request.Request(url, headers=headers)

   # make the API request
   try:
     response = urllib.request.urlopen(request)
     except urllib.error.HTTPError as e:
     print(e)
   raise e
     except urllib.error.URLError as e:
     print(e)
     raise e

   # read the response and return the json
   raw_data = response.read().decode("utf-8")
   return json.loads(raw_data)

Putting this all together, we have a simple Python class that acts as an API wrapper for GET requests to the Twitter REST API, including the signing and authentication of those requests. Using it is as simple as:

ta = Twitter_API()

# retrieve tweets for a user
params = {
   "screen_name": "martinjc",
}

user_tweets = ta.query_get("statuses", "user_timeline", params)

The full code is online on Github, and is released under an Apache 2.0 Licence.

 

 

 

 

 

Filed Under: Blog Tagged With: api, coding, compj, python, tools, twitter

Computational Journalism: the manifesto

26th September 2014 by Martin Chorley

While discussing the new MSc course between ourselves and with others, we have repeatedly come up with the same issues and themes, again and again. As a planning exercise earlier in the summer, we gathered some of these together into a ‘manifesto’. Primarily useful to us to ensure we’re thinking consistently, we’re also making it public to show others what we’re talking about when we say ‘Computational Journalism’.

People may agree or disagree with it, and may want to have their own input into it, and that’s part of what it’s there for. The other reason for its existence is to attempt to show how much further this goes than just journalists using computers to analyse data. This not an aggressive ‘this is exactly where the boundaries lie’ sort of exercise, more of a ‘we think there’s something around about these parts’ kind of statement. We expect this to be an organic document that will change as time goes on.

 

This slideshow requires JavaScript.

 

These are the key themes we’ve used to talk about what’s needed in Computational Journalism:

  • Understand Community

Understand that it’s not an audience that you’re dealing with, but a community of which you are an integral part. For the computational journalist this means understanding not only what content they want to receive and interact with, but how to build it, allow others to build upon it, and help the community as a whole to understand it.

  • Be Creative

This is a key computational thinking principle – the ability to take existing principles and techniques and apply them creatively to solve a new problem. If that’s not possible, then invent new tools, techniques and principles to solve the problem.

  • Be Playful

Another key computational thinking principle that relates closely to the previous theme. What new use can you put that technology to? If you apply this technique in a new way, what happens? It’s important to try and think outside of the usual patterns and workflows to discover new processes and facts.

  • Create your environment

This is about learning to be a developer; creating the workflows and processes that allow you to create efficiently and respond quickly. Know the tools and techniques you use to go from idea to research to analysis to creation to output and dissemination. Improve and refine this environment constantly.

  • Learn by doing

Learning to code is easier when you learn by doing. Dive into projects and use them as the motivation for learning new languages, frameworks and libraries. Examine other projects and see how they’ve solved problems, adapt what you find, and apply it in your code.

  • Be lazy when needed

There’s a reason “Don’t Repeat Yourself” is a popular programming mantra – because it makes sense. Reuse code, reuse tools. Never re-invent the wheel unless it’s absolutely necessary. Make use of others’ libraries if you can.

  • Be understood

When communicating with your community it’s obviously important to make sure you’re being understood. Often you’ve been immersed in a project for a long time, you know the details inside and out. Others don’t, and it’s your job to communicate those details as succinctly and clearly as possible. Don’t overlook what seems ‘simple’ to you and assume everyone else will pick up on it. At the same time, don’t overwhelm others with vast oceans of content that drown your message. There’s a balance – find it.

  • Learn to write clearly

As above, your community needs you to communicate well. But don’t forget that the code you write is also a part of the conversation. It’s not good enough for your output to be clear, your intermediate processes must be understood too. Best practices and coding principles exist for a reason. It’s not just humans that interact with your outputs, machines need to understand your code too. Sloppy coding and design leads to bugs, errors and inaccuracy.

  • See past the numbers

It’s not just about analysing data. Data is important, and without the statistics to prove your statements you have nothing. But never forget the story and the message. Don’t neglect context just because you’ve got a significant p value. Remember that the tools you’re creating have value and are part of the story too.

  • Share everything

Whatever you can share, share. Help others to build upon your work. Licence everything. It’s no good releasing code and data without releasing the terms under which others can use it – a codebase with no licence or terms of reuse is no use to anybody. If you’re building on the work of others, it’s only fair to give back. Communities are stronger when everyone can contribute and receive value.

Filed Under: Blog, The Lab Tagged With: compj, issues, manifesto, themes

Copyright © 2023 · News Pro Theme on Genesis Framework · WordPress · Log in