Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
TheyWorkForYou is a great website for keeping up with British politics and one of the many fine things mySociety does to make democracy in the UK more transparent.
There’s also an API, accessible via http
and wrapped up for a few languages. However, R is not amongst them, so I wrote twfy.
If you’re interested in using it (and you’ve got devtools
installed) you can install it with
devtools::install_github("conjugateprior/twfy")
It was my first proper API package and a bit of a learning experience. If you want to hear more about that, read on.
APIs
First some recap, for those just joining us.
The TheyWorkForYou API works with parameterized GETs to URLs with a common base:
http://theyworkforyou.com/api/
and different endpoints, depending on what you want. First you sign up for an API key and then you make the calls.
For example, if you want a list of UK parliamentary constituencies then your endpoint is getConstituency
, which takes either a name
or a postcode
, plus your API key key
and an output
specification, and returns a structured constituency object.
In a browser, the complete call looks like
https://www.theyworkforyou.com/api/getConstituency?name=Keighley&output=js&key=adsfiddscsdlsdlk
where of course adsfiddscsdlsdlk
isn’t really an API key. It just plays one the web.
The server returns a JSON object:
{ "bbc_constituency_id" : "344", "guardian_election_results" : "http://www.guardian.co.uk/politics/constituency/1050/keighley", "guardian_id" : "1050", "guardian_name" : "Keighley", "pa_id" : "338", "name" : "Keighley" }
Except that’s only sort of true. The server claims to return a Javascript object, as we can tell from its MIME type. "text/javascript; charset=iso-8859-1"
. We’ll just treat it like JSON though.
Making this call and decoding the result programmatically is straightforward with the right packages
library(httr) library(jsonlite) q <- list(output="js", key="adsfiddscsdlsdlk", name="Keighley") url <- "https://www.theyworkforyou.com/api/getConstituency" resp <- GET(url, query=q) # make the call json <- fromJSON(content(resp)) # parse the response
where we’ve let the GET
function deal with all the URL escaping, and just asked fromJSON
to do the right thing. By default, jsonlite
generally believes the right thing looks like a data.frame and most of the time that’s fine.
So far so straightforward.
Creating a general purpose function to call the API
It’s pretty easy to generalize this code to create a generic API calling function
call_api(endpoint, ...)
Inside that function we’ll just grab the arguments as list(...)
, add our preferred output
format and the API key
, and hand the whole thing to GET
.
With call_api
in hand it’s possible to make all the API functions look pretty much the same, e.g.
getConstituencies <- function(date=NULL, search=NULL){ params <- list(data=date, search=search) call_api("getConstituencies", params) }
but you know, why write ‘getConstituencies’ twice?
Let’s be a bit more general, and use the fact that R functions know what their names are, and that all function parameters have a name, to make a completely general function body.
The actual function in twfy is
getConstituencies <- function(date=NULL, search=NULL){ params <- params_from_call(match.call()) do.call("call_api", params) }
which has exactly the same body as
getMPInfo <- function(id, fields=NULL){ params <- params_from_call(match.call()) do.call("call_api", params) }
Cute.
If the API adds another endpoint, I’ll create a new function with this body, give it the name of the endpoint, and write the help in roxygen above it.
So how does this work?
Well, inside (any) function as.list(match.call())
is a list with an unlabeled first element that is the name of the function, and subsequent labeled components that are its arguments. If we call getConstituencies
function above with search="Keigh"
that means
[[1]]
getConstituencies
$search
[1] "Keigh"
All the package’s params_from_call
does is remove the first argument from the list and re-add it (as.character
, because it’s actually an R symbol) under the new label endpoint
, so that params
is
$search
[1] "Keigh"
$endpoint
[1] "getConstituencies"
I then use do.call
to call call_api
with these arguments. This works because call_api
is looking for an argument called endpoint and zero or more named arguments, and params gives it one of each.
This leads to the question: why even have separate function for each endpoint offered by the API? There are two answers:
First, an important effect of wrapping an API is to have the documentation near to hand. This requires separate R functions to write the roxygen above.
Speaking of documentation, TheyWorkForYou is a little bit vague about what each of its endpoints returns, so if you’re listening, a pointer to some more documentation would be great.
Second, it is sometimes useful to pre- or post-process the arguments to do.call
. Here’s an example of how documentation and pre-processing interact:
getDebates <- function(type=c("commons", "westminsterhall", "lords", "scotland", "northernireland"), date=NULL, search=NULL, person=NULL, gid=NULL, order=c("d", "r"), page=NULL, num=NULL){ params <- params_from_call(match.call()) params$type <- match.arg(type) params$order <- match.arg(order) do.call("call_api", params) }
The user must specify a legislative body to search with the type
argument, and can specify a results ordering with the order
argument. The function definition is a good place to put the small number of argument possibilities, not least because they will get picked up by command completion.
In the code above I process the function’s arguments as usual, but then step in and fix the values of type
and order
using match.arg
in the normal way, before making the call.
Where did I leave my keys?
Like most APIs TheyWorkForYou requires a key to use. Here I follow Hadley Wickham’s very useful guidelines (see the links at the end) and store it as an environment variable.
In twfy there’s an internal function that prompts for a key as necessary
get_api_key <- function(){ key <- Sys.getenv("TWFY_API_KEY") if (key == ""){ key <- ask_for_key() if (key != ""){ Sys.setenv(TWFY_API_KEY=key) add_key_to_renviron(key) # and set up for next time } else stop("Hint: you can request an API key from http://theyworkforyou.com/api/key") } key }
The first time it’s needed, this prompts the user for the key, sets its value in the local environment, and writes a line into the user’s .Renviron
file so it’s available in later sessions.
There is a set_api_key
, but this is only really needed to reset an existing key.
Testing with keys
If you’re a fan of continuous integration, then the next challenge is to set things up in such a way as not to expose the API key in the server logs or hardcode it into the R source. twfy uses Travis, and for Travis the solution is to set the api key as an environment variable in the repository settings.
By default these variables do not appear in the build logs, and that’s the way we like it.
The current .travis.yml
for twfy looks like
language: R sudo: false cache: packages before_install: - echo "TWFY_API_KEY=${TWFY_API_KEY}" > ~/.Renviron
Actually I’m not sure whether it’s even necessary to drop Travis’s copy of the API key into the .Renviron
to get picked up by the package functions. It’s possible that R picks up local environment variables more reliably on Ubuntu than on my OS X box.
Still, this works. The package builds and no third parties (or forks) see the API key.
Further reading
If you find yourself wrapping an API I’d thoroughly recommend reading
- The httr vignette on API packages.
- the jsonlite vignette on REST APIs.
Somebody always got there first
It turns out that Jack Blumenau saw TheyWorkForYou’s API and thought the same thing. Great minds, and all that. You can see his take on the problem at here. He likes XML quite a bit more than me, apparently.
In any case, as Chairman Mao once said: “Let a hundred flowers blossom, let a hundred schools of thought contend. And let the winner drive the losers onto a small volcanic island shaped like a sweet potato”. Or something like that.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.