Site icon R-bloggers

Streaming Cloud Data to R

[This article was first published on R Language in Datazar on Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Saving your data in the cloud ensures that when you send your scripts to your colleagues, you don’t have to send them your data or any additional files with it. When it’s a URL link rather than “C://…” or “/home/…”, your script is always pointing to the same path/address. In this text, we’ll go through how we can use a cloud dataset in our R scripts. We’re going to be using a dataset containing the population of Earth from 5000BC until 2016.

There are several ways and packages to access a url from R. In the Datazar SDK, we’ll be using the “httr” package. Let’s go ahead and grab the Datazar SDK for R. I’ve also included the R code here so you can just copy and paste it to your script.

datazar<-function(username,token,objectType,objectId,option) {
    require(httr)
    require(jsonlite)
    url<-paste("https://api.datazar.com/",objectType,"/",objectId,"/",option,"",sep="")
    data<-GET(url,authenticate(username,token,type="basic"))
    return(fromJSON(content(data),flatten=TRUE))
}
datazarData<-function(username,token,fileId) {
    return(datazar(username,token,"files",fileId,"data"))
}

We’ll be using the datazarData function.

Here are the parameters we need:

myUsername<-"aman"
myToken<-"mysupersecrettokenthaticantshow"
fileId<-"f7cb0a20c-2f1c-4ad5-9d05-900d7af97a9c"
data<-datazarData(myUsername,myToken,fileId)

That’s it! All done. There’s no need to parse the JSON since the datazarData function takes care of it. Let’s go ahead and plot it so we can see what it looks like.

plot(data,"Year","Population")
R Plot of the streamed dataset.

Conclusion

We went over how to stream datasets directly from the cloud. This method uses HTTP “Basic Authentication” and secures your connection to the Datazar API while you’re streaming your datasets.

I have included both the R script in a project to you can use that one if you want to.

R Script link.

Just modify the parameters to your own Datazar username and token. Using this as best practice will ensure your data is always in one location and you or your colleagues will never have to change dataset location-pointers in your scripts.

Hope you enjoyed this! Feel free to ask questions if you’re stuck somewhere.

Note: there’s a related post on how to do the exact same thing with Mathematica.


Streaming Cloud Data to R was originally published in Datazar on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: R Language in Datazar on Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.