Dockerise and deploy your own R Archive Repo
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There are severals reasons you would want to deploy your own R archive repo: you don’t want to rely on GitHub for your dev packages, you want to use a more “confidential” way, or maybe (and that’s good enough a reason), you’re a nerd and you like the idea of hosting your own repo. So, here’s how to.
What’s a repo?
An R archive network / repo is a URL (unique resource locator) where you can download packages from. For example, when you do :
install.packages("attempt")
There is an argument called “repos”, which is defining the spot on the internet where I want R to go and get the package. By default, you don’t have to specify that argument, as it is defined as : getOption("repos")
. For example, right now, on my laptop, I have:
getOption("repos") ## CRAN ## "https://cran.rstudio.com/" ## attr(,"RStudio") ## [1] TRUE
Which indicates that when I try to install a package, R will go an look on the mirror of the CRAN hosted at RStudio. But I could specify any other endpoint:
install.packages(pkgs = "attempt", repos = "http://mirror.fcaglp.unlp.edu.ar/CRAN/", type = "source")
Here, I’m installing {attempt}
from Argentina.
What’s in a RAN?
About install.packages
So, how does this work? What does install.packages
do when it is called?
We will not dive in the precise details, but let’s sum up:
- install.packages goes to the url, and looks for “url/src/contrib”
- in this folder, R looks for a file called
PACKAGES
- R parses this file, isolate the
pkgs
elements, add the necessary elements for the download (version number and other things…) - R download and install the package
It’s “that” simple: if your endpoint has a “src/contrib” folder, if inside this folder there is a PACKAGES
file well filled, and if all the tar.gz are there too, you can install.packages(pkgs = "mypkg", repos = "myrepo", type = "source")
.
The PACKAGES
file
In this file, you’ll need to have an entry for each package in your repo. Each one should be described as:
Package: craneur # The name of your package Version: 0.0.0.9000 # The version Imports: attempt, desc, glue, R6, tools # The Imports Suggests: testthat # The suggests License: MIT + file LICENSE # The licence MD5sum: e3ef1ff3d829c040c9bafb960fb8630b # The MD5sum NeedsCompilation: no # Wether or not your package needs compilation
With {craneur}
Doing this by hand can be cumbersome, so I’ve developped this little package to do this automatically, called {craneur}, that you can get with:
remotes::install_github("ColinFay/craneur")
Here’s how to use it:
library(craneur) colin <- Craneur$new("Colin") colin$add_package("../craneur_0.0.0.9000.tar.gz") colin$add_package("../jekyllthat_0.0.0.9000.tar.gz") colin$add_package("../tidystringdist_0.1.2.tar.gz") colin$add_package("../attempt_0.2.1.tar.gz") colin$add_package("../rpinterest_0.4.0.tar.gz") colin$add_package("../rgeoapi_1.2.0.tar.gz") colin$add_package("../proustr_0.3.0.9000.tar.gz") colin$add_package("../languagelayeR_1.2.3.tar.gz") colin$add_package("../fryingpane_0.0.0.9000.tar.gz") colin$add_package("../dockerfiler_0.1.1.tar.gz") colin$add_package("../devaddins_0.0.0.9000.tar.gz") colin ## package path ## 1 craneur ../craneur_0.0.0.9000.tar.gz ## 2 jekyllthat ../jekyllthat_0.0.0.9000.tar.gz ## 3 tidystringdist ../tidystringdist_0.1.2.tar.gz ## 4 attempt ../attempt_0.2.1.tar.gz ## 5 rpinterest ../rpinterest_0.4.0.tar.gz ## 6 rgeoapi ../rgeoapi_1.2.0.tar.gz ## 7 proustr ../proustr_0.3.0.9000.tar.gz ## 8 languagelayeR ../languagelayeR_1.2.3.tar.gz ## 9 fryingpane ../fryingpane_0.0.0.9000.tar.gz ## 10 dockerfiler ../dockerfiler_0.1.1.tar.gz ## 11 devaddins ../devaddins_0.0.0.9000.tar.gz
You can then save it with:
colin$write()
You now have a folder you can copy and paste on your server. This server can be your own ftp, a university server, a git repo… anywhere you can point to with a url!
Note: there are other packages that can do this, also. Notably {drat}, {cranlike} or {packrat}.
Creating a server
With Digital Ocean
For the sake of this article, I’ll use a server deployed on Digital Ocean. If you want to try DO, here’s a 10$ coupon (full disclosure: it’s an affiliated link, and I’ll get a 10$ credit if ever you spend 25 there).
As this is not a DO deployment tuto, I’ll skip this part and assume you succeeded to install a server (roughly, it’s juste “create a droplet with ubuntu”, and access with ssh using the password you receive by mail). You can still refer to the doc if you need more info about how to deploy a droplet.
So, I’ve launched my DO server throught ssh (with the password received via email), and installed Docker, following this tutorial.
I now have a digital ocean machine with Docker on it.
The Dockerfile
Let’s write the Dockerfile for our RAN. Basically, we’ll need
- a webserver — which will be launched with the {servr} package (let’s keep the project R-only)
- the ran repo I created earlier
This simple Dockerfile would create a RAN:
library(dockerfiler) dock <- Dockerfile$new() dock$RUN("mkdir usr/ran/src/contrib/ -p") dock$COPY("src/contrib", "usr/ran/src/contrib") dock$RUN("Rscript -e 'install.packages(\"httpuv\", repos = \"https://cran.rstudio.com/\")'") dock$RUN("Rscript -e 'install.packages(\"jsonlite\", repos = \"https://cran.rstudio.com/\")'") dock$RUN("Rscript -e 'install.packages(\"servr\", repos = \"https://cran.rstudio.com/\")'") dock$EXPOSE(8000) dock$CMD("Rscript -e 'servr::httd(\"usr/ran/src/contrib\", host = \"0.0.0.0\", port = 8000)'") dock FROM rocker/r-base RUN mkdir usr/ran/src/contrib/ -p COPY src/contrib usr/ran/src/contrib RUN Rscript -e 'install.packages("httpuv", repos = "https://cran.rstudio.com/")' RUN Rscript -e 'install.packages("jsonlite", repos = "https://cran.rstudio.com/")' RUN Rscript -e 'install.packages("servr", repos = "https://cran.rstudio.com/")' EXPOSE 8000 CMD Rscript -e 'servr::httd("usr/ran/src/contrib", host = "0.0.0.0", port = 8000)'
But there is a thing that’s missing: what if I want to regenerate a RAN everytime I have a new package? Well, let’s write a different Dockerfile to do that.
A updatable Dockerfile
- First of all, I’ll copy all the packages sources in a pkg folder
pkg <- list.files("../", pattern = "tar.gz", full.names = TRUE) file.copy(pkg, "pkg") list.files("pkg") ## [1] "attempt_0.2.1.tar.gz" "craneur_0.0.0.9000.tar.gz" ## [3] "devaddins_0.0.0.9000.tar.gz" "dockerfiler_0.1.1.tar.gz" ## [5] "fryingpane_0.0.0.9000.tar.gz" "jekyllthat_0.0.0.9000.tar.gz" ## [7] "languagelayeR_1.2.3.tar.gz" "prenoms_0.1.0.tar.gz" ## [9] "proustr_0.3.0.9000.tar.gz" "rgeoapi_1.2.0.tar.gz" ## [11] "rpinterest_0.4.0.tar.gz" "tidystringdist_0.1.2.tar.gz"
- I’ll then create a craneur.R (
file.create("craneur.R")
) to automatically launch and write with {craneur} from a folder. It will contain the following code:
library(craneur) colin <- Craneur$new("Colin") lapply(list.files("usr/pkg", pattern = "tar.gz", full.names = TRUE), function(x) colin$add_package(x)) colin$write(path = "usr/ran")
- As I want the user to be able to do
http://url
only, and as my RAN index is insrc/contrib
, I’ll create an html that simply does the redirection:
file.create("index.html")
with in it:
- And here is the new Dockerfile:
dock <- Dockerfile$new() # Install the packages dock$RUN("Rscript -e 'install.packages(\"httpuv\", repos = \"https://cran.rstudio.com/\")'") dock$RUN("Rscript -e 'install.packages(\"jsonlite\", repos = \"https://cran.rstudio.com/\")'") dock$RUN("Rscript -e 'install.packages(\"servr\", repos = \"https://cran.rstudio.com/\")'") dock$RUN("Rscript -e 'install.packages(\"remotes\", repos = \"https://cran.rstudio.com/\")'") dock$RUN("Rscript -e 'remotes::install_github(\"ColinFay/craneur\")'") # Create the dir dock$RUN("mkdir usr/ran -p") dock$RUN("mkdir usr/pkg -p") # Move some stuffs dock$COPY("craneur.R", "usr/pkg/craneur.R") dock$COPY("pkg", "usr/pkg") # Copy the index.html dock$COPY("index.html", "usr/ran/index.html") # Create the folders dock$RUN("Rscript usr/pkg/craneur.R") # Open port dock$EXPOSE(8000) # Launch server dock$CMD("Rscript -e 'servr::httd(\"usr/ran/\", host = \"0.0.0.0\", port = 8000)'") dock FROM rocker/r-base RUN Rscript -e 'install.packages("httpuv", repos = "https://cran.rstudio.com/")' RUN Rscript -e 'install.packages("jsonlite", repos = "https://cran.rstudio.com/")' RUN Rscript -e 'install.packages("servr", repos = "https://cran.rstudio.com/")' RUN Rscript -e 'install.packages("remotes", repos = "https://cran.rstudio.com/")' RUN Rscript -e 'remotes::install_github("ColinFay/craneur")' RUN mkdir usr/ran -p RUN mkdir usr/pkg -p COPY craneur.R usr/pkg/craneur.R COPY pkg usr/pkg COPY index.html usr/ran/index.html RUN Rscript usr/pkg/craneur.R EXPOSE 8000 CMD Rscript -e 'servr::httd("usr/ran/", host = "0.0.0.0", port = 8000)' dock$write()
So here, if I build it:
docker build -t ran .
And:
docker run -d -p 80:8000 ran
I can go to http://127.0.0.1/
on my browser, and I’ll get the index of all available packages.
I can now try:
install.packages("attempt", repos = "http://127.0.0.1/", type = "source")
And that works as expected
To the server and beyond
Let’s copy everything on our server in our ran
folder:
scp torun.R [email protected]:/usr/ran/ scp craneur.R [email protected]:/usr/ran/ scp Dockerfile [email protected]:/usr/ran/ scp -r pkg/ [email protected]:/usr/ran/ scp index.html [email protected]:/usr/ran/
Let’s go to our virtual machine, and run the Dockerfile with the code we’ve just seen.
docker run -d -p 80:8000 ran
And tadaaa : http://206.189.28.254
.
So you can now install from your server:
install.packages("attempt", repos = "http://206.189.28.254", type = "source")
Update your server
So now, the good thing here is that I can update my package server if ever I remove or add a new tar.gz : I’ll just have to rebuild my Docker image.
Further work
Efficient update
Here, to be really efficient, I should split my Docker images in two: one with all the packages, and one with the {craneur} generation : that way, I wouldn’t have to recompile my docker image from scratch everytime I have a modification in the package list.
DNS
A http://206.189.28.254 is not that nice an adress to share or remember, so we could buy a domain and point it to our server. But… that’s for another day
The post Dockerise and deploy your own R Archive Repo appeared first on The R Task Force.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.