Site icon R-bloggers

source_GitHubData: a simple function for downloading data from GitHub into R

[This article was first published on Christopher Gandrud (간드루드 크리스토파), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Update 31 January: I’ve folded source_GitHubData into the repmis packaged. See this post.


Update 7 January 2012: I updated the internal workings of source_GitHubData so that it now relies on httr rather than RCurl. Also it is more directly descended from devtool‘s source_url command.

This has two advantages.

The post has been rewritten to reflect these changes.


In previous posts I’ve discussed how to download data stored in plain-text data files (e.g. CSV, TSV) on GitHub directly into R.

Not sure why it took me so long to get around to this, but I’ve finally created a little function that simplifies the process of downloading plain-text data from GitHub. It’s called source_GitHubData. (The name mimicks the devtools syntax for functions like source_gist and source_url. The function’s syntax is actually just a modified version of source_url.)

The function is stored in a GitHub Gist HERE (it’s also at the end of this post). You can load it directly into R with devtools’ source_gist command.

Here is an example of how to use the function to download the electoral disproportionality data I discussed in an earlier post.

# Load source_GitHubData
library(devtools)

# The functions' gist ID is 4466237
source_gist("4466237")

# Create Disproportionality data UrlAddress object
# Make sure the URL is for the "raw" version of the file
# The URL was shortened using bitly
UrlAddress <- "http://bit.ly/Ss6zDO"

# Download data
Data <- source_GitHubData(url = UrlAddress)

# Show Data variable names
names(Data)


## [1] "country"            "year"               "disproportionality"

There you go.

Note that the the function is set by default to load comma-separated data (CSV). This can easily be changed with the sep argument.

To leave a comment for the author, please follow the link and comment on their blog: Christopher Gandrud (간드루드 크리스토파).

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.