Noob package development: job ads data in R with Adzuna
[This article was first published on Adventures in Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Part of our mission at The Data Lab is to create jobs in Data Science. This got us thinking that perhaps we should find a systematic way of quantifying the job market in Scotland and generally. We stumbled upon Adzuna, never heard of it? It’s quite cool, looks and feels like a jobs board but is actually a very data enabled service. For example you can upload your CV and it will leverage its database of job ads to predict your salary worth. I still have not had the guts to upload my own 🙂
Anyway the good news for us is that Adzuna has a nice REST API that we can query and get results for Scotland or any geographical area. One of my short term goals was to learn more about packages in R and so I decided this was a good opportunity to develop a lightweight R package. First of all go and read this blog by Hilary Parker which was an absolutely perfect distillation of the process. In fact, this page is a shameless re-hash of Hilary’s page with my own function. Unless you’re interested in getting job data or data from other REST API’s then go and read her page instead.
Before you start you need to install a few packages.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
install.packages("devtools")
library("devtools")
devtools::install_github("klutometis/roxygen")
library(roxygen2)
setwd("parent_directory")
create("adzunar")
http://api.adzuna.com/v1/api/property/gb/search/1?app_id={YOUR_APP_ID}&app_key={YOUR_APP_KEY}
api.adzuna.com
, the geographic location of interest gb
and after the final /
some further arguments, a minumum of your api key and app id (obtained from the sample place as the API documentation).
So all we would need to do in R is construct this url pasting in the values of our arguments of choice. Once we have this we can use a very handy function from library(jsonlite)
that will submit the API call, retrieve the json data object and convert it to a data.frame. Wow! Tip of the hat to the author of library(jsonlite)
Jeroen Ooms.
I wrote this function as follows;
get_country_page <- function(
keyword,
country,
app_id,
app_key,
page
) {
this_url <- paste0(
"http://api.adzuna.com:80/v1/api/jobs/",
country,
"/search/",
page, "?",
"app_id=", app_id,
"&app_key=", app_key,
"&results_per_page=50",
"&what=", keyword
)
dat <- fromJSON(this_url)
return(dat)
}
fromJSON()
and then returning the result. In this case the i’m allowing for some search terms, a different geographic location, the app key and id and finally the page. If you imagine that the web front end returns results in pages that you navigate through, the API is the same, you need to request a particular page of the results. In reality I have added some more functionality to this, which you can examine at the github repo for adzunar.
Once you have your function, its then really easy to write all of the documentation. What you do is write special comments at the top of the file for the function, then library(roxygen2)
will compile it into the help file for that function. Below is an example for the function I wrote.
#' Function to query the API by keyword country and results page.
#'
#' This function allows you to query the adzuna API, specifying a keyword, a country code and the number of results that you want. The API limit is 50 per page but if you specify more than that this function will continue to run your query, request succesive pages of the results and return the aggregate data object as a `data.frame`. You can request results that exceed the maximum returned by the API.
#' @param keyword A search string (required)
#' @param country A two letter country code. Any one of "gb", "au", "br", "ca", "de", "fr", "in", "nl", "pl", "ru", "za". Defaults to "gb".
#' @param app_id Your app id provided by Adzuna (required)
#' @param app_key Your app key provided by Adzuna (required)
#' @param n_results The number of results requested. Defaults to 50.
#' @keywords adzuna, API, data download, job adverts
#' @export
#' @examples
#' # (not run)
#' # id <- [Your app id]
#' # key <- [Your app key]
#' # get_country_page("data science", "gb", id, key)
document()
function from your console, this will be compiled into the help file for that function. The next step is to share your package. This can be easily done with github. I’m not going to provide much detail about this as this can be found elsewhere but for the basics you need four commands to share your package on github.
git init
git add
git commit
git push
devtools::install_github("rmnppt/adzunar")
library(adzunar)
To leave a comment for the author, please follow the link and comment on their blog: Adventures in Data.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.