Site icon R-bloggers

Packaging the TheyWorkForYou API

[This article was first published on R – CONJUGATEPRIOR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

TheyWorkForYou is a great website for keeping up with British politics and one of the many fine things mySociety does to make democracy in the UK more transparent.

There’s also an API, accessible via http and wrapped up for a few languages. However, R is not amongst them, so I wrote twfy.

If you’re interested in using it (and you’ve got devtools installed) you can install it with

devtools::install_github("conjugateprior/twfy")

It was my first proper API package and a bit of a learning experience. If you want to hear more about that, read on.

APIs

First some recap, for those just joining us.

The TheyWorkForYou API works with parameterized GETs to URLs with a common base:

http://theyworkforyou.com/api/

and different endpoints, depending on what you want. First you sign up for an API key and then you make the calls.

For example, if you want a list of UK parliamentary constituencies then your endpoint is getConstituency, which takes either a name or a postcode, plus your API key key and an output specification, and returns a structured constituency object.

In a browser, the complete call looks like

https://www.theyworkforyou.com/api/getConstituency?name=Keighley&output=js&key=adsfiddscsdlsdlk

where of course adsfiddscsdlsdlk isn’t really an API key. It just plays one the web.

The server returns a JSON object:

{
  "bbc_constituency_id" : "344",
  "guardian_election_results" : "http://www.guardian.co.uk/politics/constituency/1050/keighley",
  "guardian_id" : "1050",
  "guardian_name" : "Keighley",
  "pa_id" : "338",
  "name" : "Keighley"
}

Except that’s only sort of true. The server claims to return a Javascript object, as we can tell from its MIME type. "text/javascript; charset=iso-8859-1". We’ll just treat it like JSON though.

Making this call and decoding the result programmatically is straightforward with the right packages

library(httr)
library(jsonlite)

q <- list(output="js", key="adsfiddscsdlsdlk", name="Keighley")
url <- "https://www.theyworkforyou.com/api/getConstituency"

resp <- GET(url, query=q) # make the call
json <- fromJSON(content(resp)) # parse the response

where we’ve let the GET function deal with all the URL escaping, and just asked fromJSON to do the right thing. By default, jsonlite generally believes the right thing looks like a data.frame and most of the time that’s fine.

So far so straightforward.

Creating a general purpose function to call the API

It’s pretty easy to generalize this code to create a generic API calling function

call_api(endpoint, ...)

Inside that function we’ll just grab the arguments as list(...), add our preferred output format and the API key, and hand the whole thing to GET.

With call_api in hand it’s possible to make all the API functions look pretty much the same, e.g.

getConstituencies <- function(date=NULL, search=NULL){
  params <- list(data=date, search=search)
  call_api("getConstituencies", params)
}

but you know, why write ‘getConstituencies’ twice?

Let’s be a bit more general, and use the fact that R functions know what their names are, and that all function parameters have a name, to make a completely general function body.

The actual function in twfy is

getConstituencies <- function(date=NULL, search=NULL){
  params <- params_from_call(match.call())
  do.call("call_api", params)
}

which has exactly the same body as

getMPInfo <- function(id, fields=NULL){
  params <- params_from_call(match.call())
  do.call("call_api", params)
}

Cute.

If the API adds another endpoint, I’ll create a new function with this body, give it the name of the endpoint, and write the help in roxygen above it.

So how does this work?

Well, inside (any) function as.list(match.call()) is a list with an unlabeled first element that is the name of the function, and subsequent labeled components that are its arguments. If we call getConstituencies function above with search="Keigh" that means
[[1]] getConstituencies $search [1] "Keigh"
All the package’s params_from_call does is remove the first argument from the list and re-add it (as.character, because it’s actually an R symbol) under the new label endpoint, so that params is
$search [1] "Keigh" $endpoint [1] "getConstituencies"
I then use do.call to call call_api with these arguments. This works because call_api is looking for an argument called endpoint and zero or more named arguments, and params gives it one of each.

This leads to the question: why even have separate function for each endpoint offered by the API? There are two answers:

First, an important effect of wrapping an API is to have the documentation near to hand. This requires separate R functions to write the roxygen above.

Speaking of documentation, TheyWorkForYou is a little bit vague about what each of its endpoints returns, so if you’re listening, a pointer to some more documentation would be great.

Second, it is sometimes useful to pre- or post-process the arguments to do.call. Here’s an example of how documentation and pre-processing interact:

getDebates <- function(type=c("commons", "westminsterhall", "lords",
                 "scotland", "northernireland"),
                 date=NULL, search=NULL, person=NULL, gid=NULL, 
                 order=c("d", "r"), page=NULL, num=NULL){
  params <- params_from_call(match.call())
  params$type <- match.arg(type)
  params$order <- match.arg(order)
  do.call("call_api", params)
}

The user must specify a legislative body to search with the type argument, and can specify a results ordering with the order argument. The function definition is a good place to put the small number of argument possibilities, not least because they will get picked up by command completion.

In the code above I process the function’s arguments as usual, but then step in and fix the values of type and order using match.arg in the normal way, before making the call.

Where did I leave my keys?

Like most APIs TheyWorkForYou requires a key to use. Here I follow Hadley Wickham’s very useful guidelines (see the links at the end) and store it as an environment variable.

In twfy there’s an internal function that prompts for a key as necessary

get_api_key <- function(){
  key <- Sys.getenv("TWFY_API_KEY")
  if (key == ""){
    key <- ask_for_key()
    if (key != ""){
      Sys.setenv(TWFY_API_KEY=key)
      add_key_to_renviron(key)  # and set up for next time
    } else
      stop("Hint: you can request an API key from http://theyworkforyou.com/api/key")
  }
  key
}

The first time it’s needed, this prompts the user for the key, sets its value in the local environment, and writes a line into the user’s .Renviron file so it’s available in later sessions.

There is a set_api_key, but this is only really needed to reset an existing key.

Testing with keys

If you’re a fan of continuous integration, then the next challenge is to set things up in such a way as not to expose the API key in the server logs or hardcode it into the R source. twfy uses Travis, and for Travis the solution is to set the api key as an environment variable in the repository settings.

By default these variables do not appear in the build logs, and that’s the way we like it.

The current .travis.yml for twfy looks like

language: R
sudo: false
cache: packages

before_install:
 - echo "TWFY_API_KEY=${TWFY_API_KEY}" > ~/.Renviron

Actually I’m not sure whether it’s even necessary to drop Travis’s copy of the API key into the .Renviron to get picked up by the package functions. It’s possible that R picks up local environment variables more reliably on Ubuntu than on my OS X box.

Still, this works. The package builds and no third parties (or forks) see the API key.

Further reading

If you find yourself wrapping an API I’d thoroughly recommend reading

Somebody always got there first

It turns out that Jack Blumenau saw TheyWorkForYou’s API and thought the same thing. Great minds, and all that. You can see his take on the problem at here. He likes XML quite a bit more than me, apparently.

In any case, as Chairman Mao once said: “Let a hundred flowers blossom, let a hundred schools of thought contend. And let the winner drive the losers onto a small volcanic island shaped like a sweet potato”. Or something like that.

To leave a comment for the author, please follow the link and comment on their blog: R – CONJUGATEPRIOR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.