Site icon R-bloggers

Web data acquisition: the structure of RCurl request (Part 2)

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The acquisition of data in json structure presented in part 1 clearly showed the functioning of the client-server connection and the possibility to collect the data of interest. However, the json output appeares as a set of raw data in a json string that needs to be structured and stored in a suitable form for data processing and statistical analysis.

For this reason, it makes sense to develop the entire process using #R in order to have the data directly queried, collected, parsed, structured and made usable in a unique environment. Of course, this will be the one used in the process “last mile”, i.e. data analysis.

The curl library adopted in the command line process described in the previous post has its alter ego in the RCurl library. Together with jsonlite for ‘R-JSON translation’ these are the necessay packages for the development of the request as presented in the following code.

# before loading the libraries rememeber to install them - install.packages('library here')
library(RCurl)
library(jsonlite)

# save the url of the request in an object (same as -X POST in the curl request)

url <- 'https://www.googleapis.com/qpxExpress/v1/trips/search?key={SERVER_KEY}&alt=json'
# headers (same as -H)
headers <- list('Accept' = 'application/json', 'Content-Type' = 'application/json', 'charset' = 'UTF-8')

# R structure of the input for the request (same as -d + JSON)
x = list(
  request = list(
    slice = list(
      list(origin = 'FCO', destination = 'LHR', date = '2017-06-30')),
    passengers = list(adultCount = 1, infantInLapCount = 0, infantInSeatCount = 0, childCount = 0, seniorCount = 0),
    solutions = 500,
    refundable = F))

# url, headers and x are the parameters to be used in R functions to send the request
# and save the output data in the datajson object
# postForm is the RCurl function to send the request using the POST method
# toJSON is the jsonlite function to convert the R structure of the request in JSON input

datajson <- postForm(url, .opts=list(postfields=toJSON(x), httpheader=headers))
datajson

After few seconds from the POST request necessary to send the request and collect the response, all the information related to the flights with origin FCO (Fiumicino – Rome) and destination LHR (London Heathrow) will be hosted in the datajson object, similarly to the command line procedure. The json string holds and hides all the observations and variables of interest for the statistical analysis inlcuding the most important, i.e. the flight prices.

The next post will explain how to parse the json object and structure the information in a suitable dataframe for analysis using the powerful library #tidyjson.

#R #rstats #maRche #json #curl #qpxexpress #Rbloggers

This post is also shared in www.r-bloggers.com and LinkedIn

To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.