source_https(): Sourcing an R Script from github over HTTPS
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The Objective
I wanted to source R scripts hosted on my github repository for use in my blog (i.e. a github version of ?source). This would make it easier for anyone wishing to test out my code snippets on their own computers without having to manually go to my github repo and retrieve a series of R scripts themselves to make it run.
The Problem
The base R function source() fails with HTTPS links on Windows 7. There may be a way around this by starting R using –internet2 from the command line (search for CMD in windows) but that would just be another inconvenience like having to download an R script through your browser in the first place.
An easier approach would be to use RCurl:getURL() by setting either ssl.veryifypeer=FALSE or cainfo to a SSL certificates file. That’s easy enough to achieve but I wanted to wrap the code in a function for convenience as follows:
source_github <- function(u) { # load package require(RCurl) # read script lines from website script <- getURL(u, ssl.verifypeer = FALSE) # parase lines and evealuate in the global environement eval(parse(text = script)) } source("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R")
The problem with the code above was that the functions sourced from the desired R script file only existed locally in source_github() and not globally to the rest of the R session. Sadface.
The Solution
Asking on Stack Overflow produced an answer from the mighty Spacedman who added envir=.GlobalEnv as a parameter to eval. This means that the evaluation is done in the global environment and thus all the contents of the R script are available for the entire R session.
Furthermore, it occurred to me that I could make the function generic to work with any R script that is hosted over a HTTPS connection. To this end, I added a couple of lines of code to download a security certificates text file from the curl website.
source_https <- function(u, unlink.tmp.certs = FALSE) { # load package require(RCurl) # read script lines from website using a security certificate if(!file.exists("cacert.pem")) download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile = "cacert.pem") script <- getURL(u, followlocation = TRUE, cainfo = "cacert.pem") if(unlink.tmp.certs) unlink("cacert.pem") # parase lines and evealuate in the global environement eval(parse(text = script), envir= .GlobalEnv) } source_https("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R") source_https("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/htmlToText/htmlToText.R", unlink.tmp.certs = TRUE)
Using unlink.tmp.certs = TRUE will delete the security certificates text file that source_https downloads and is an optional parameter (probably best to use it only on the final call of source_https to avoid downloading the same certificates file multiple times).
UPDATE
Based on Kay’s comments, here’s a vectorised version with cross-platform SSL certificates:
source_https <- function(url, ...) { # load package require(RCurl) # parse and evaluate each .R script sapply(c(url, ...), function(u) { eval(parse(text = getURL(u, followlocation = TRUE, cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))), envir = .GlobalEnv) }) } # Example source_https("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R", "https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/htmlToText/htmlToText.R")
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.