HTTPS for CRAN: how and why
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R gained some basic support for https in version 3.2.0 (see NEWS) via the method = "libcurl"
argument in base functions download.file
and url
. The global option download.file.method
is used to make this the default.
Unfortunately the implementation has a few limitations: there is no way to set request options (authentication, proxy, headers, TLS options, etc) and the functions do not expose an http status code or response headers. Because they also do not raise an error when the request fails with an http error (as do the other download methods), this leaves you to guess if the retrieved content is what you were expecting or an error page.
# Raises an error download.file("http://httpbin.org/status/418", tempfile(), method = "internal") # Does not raise an error download.file("http://httpbin.org/status/418", tempfile(), method = "libcurl") # What it should do library(curl) curl_download("http://httpbin.org/status/418", tempfile())
Anyway it is good enough for downloading static files from public servers, which is all we need for now.
CRAN and libcurl
Because install.packages
and friends wrap around download.file
, we can use this new feature to download R packages from CRAN via https. None of the currently available CRAN servers seems to support https, so I created a demo server at https://cran.opencpu.org. This is not a real mirror, it is just a https proxy to the US mirror.
# Install a package over https install.packages("ggplot2", repos = "https://cran.opencpu.org", method = "libcurl")
Use a script like this to opt-in globally on machines where libcurl is available:
# Enable CRAN https everywhere if(capabilities("libcurl")){ options(repos = "https://cran.opencpu.org", download.file.method = "libcurl") }
Hopefully the admins in Vienna will at some point enable https for the main cran server in the same way they have done for r-forge (which is literally the neighborhing ip address).
Why CRAN and https?
Using https can stop some, but not all, MITM attacks. Encrypting the connection with the CRAN server prevents intermediate parties such as your ISP, (anti)virus, or any other user on your network from snooping or tampering with the connection. When it comes to CRAN, security is probably more of a concern than privacy, especially when using public networks on e.g. airports, coffee shops or campuses. It is easy for hackers or viruses to hijack wifi connections and inject malicious code or executables into unencrypted traffic. Using https guarantees that at least the connection between you and your CRAN mirror is secure.
Of course this does not fully guarantee the integrity of your download. You are basically putting your faith in the hands of your CRAN mirror (or the owner of the domain to be more specific). If the mirror server gets hacked, or somebody manages to tamper with the mirroring process itself (which is done using rsync without any encryption) packages can still get infected.
Linux distributions solve this problem by making package authors sign the checksum of the package with a private key. This signature is used to automatically verify the integrity of a download from the author’s public key before installation, regardless of how the package was obtained. Simon has implemented some of this for R in PKI but unfortunately this was never adopted by CRAN. But at least with https we can somewhat safely install R packages from within a coffee shop now, which solves the most urgent problem.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.