How to develop inside a Docker container to ease collaboration?

[This article was first published on Rtask, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

You can read the original post in its original format on Rtask website by ThinkR here: How to develop inside a Docker container to ease collaboration?

To ensure the reproducibility of your projects, you can develop in the Docker container that you will use to share your work. Indeed, what to do when your data analyses, publications, models are used by users with different versions of operating systems and R packages R? How do you ensure that your Dockerfile delivered with your package or project is functional? Don’t you want to develop in the Docker container you deliver for the production release of your Shiny application? Don’t you want to reproduce locally the environment of your continuous integration (CI) that has failed for no apparent reason?

Docker is now accessible natively for Windows Home users thanks to installation of WSL2 on Windows 10, and not only Windows Pro.
Hence, it is time to present you our package {devindocker} along with our reasons to work inside Docker containers.

Use {devindocker} to start a Docker with persistent package installations

{devindocker} was build to reduce time setting up a Docker container for each of our projects with the good set up for persistence of packages installed inside the container, as well as RStudio preferences.
This requires using a Docker with RStudio Server inside.
We recommend those from Rocker, starting with rocker/rstudio.
The following lines of code need to be launched outside of the project itself.
Indeed, opening up your project directly on your own computer while using {renv} may compromise the installation inside the Docker container.
Project can be a simple directory, a RStudio project, a package, a golem…

A reproducible example for a simple project

Let’s try with a reproducible example as presented in the {devindocker} README:

We create a random directory with a file inside.

# Temporary project
tempdir <- tempdir()
project_path <- file.path(tempdir, "myproject")
dir.create(project_path)
# Add a file inside
cat("# my R file", file = file.path(project_path, "my-file.R"))

Launch a Docker container with your directory inside. This should be a container with Rstudio server inside.
Note that you start outside your project, which means you will have to start a new RStudio project if this is your way of working.
Note that in this example, packages you install will not be kept after you stop the container, but RStudio preferences will.

library(devindocker)
# Which path to your working directory / project
project_path <- file.path(tempdir, "myproject")
# Which container (with Rstudio inside) ? ----
# https://hub.docker.com/r/rocker/verse
container <- "rocker/geospatial:4.0.1"
# Which port ? ----
# _Useful if multiple Rstudio Server to launch
port <- 8788
# Start Docker project ----
launch_proj_docker(
  project_path = project_path,
  container = container,
  port = port)

When you’re done, do not forget to stop properly the Rstudio Server: Click on Top right button to quit or q() in the console.

Then, end the container.

# Stop Docker properly
stop_proj_docker(project_path = project_path)

Use {renv} inside Docker and keep installation of packages

Note that you need to launch your project with {devindocker} from outside your project. Never ever open it again locally (outside a Docker container) if you want to avoid problems with bad and not compatible local {renv} setup. It is recommended to create a project dedicated to launch your {devindocker} projects.

Launch a Docker container with your directory inside. This should be a container with RStudio Server inside.
Note that you start outside your project, which means you will have to start a new RStudio project if this is your way of working.
Note also that packages you install will be kept after you stop the container, as well as RStudio preferences.

 

Follow instructions in the "renv_instructions.Rmd" file that is created inside your project.

# Which path to your working directory / project
project_path <- file.path(tempdir, "myproject")
# Which container (with Rstudio inside) ? ----
# https://hub.docker.com/r/rocker/verse
container <- "rocker/geospatial:4.0.1"
# Which port ? ----
# _Useful if multiple Rstudio Server to launch
port <- 8788
# My renv cache directory on my local computer
# Used as persistent drive for all you Docker container with {devindocker}
renv_cache <- "~/renv_cache"
# Start Docker project ----
devindocker::launch_proj_docker(
  project_path = project_path,
  container = container,
  port = port,
  renv_cache = renv_cache,
  renv_inst = TRUE, # Add an Rmd with instructions inside your project
)

When you’re done, do not forget to stop properly the Rstudio Server: Click on Top right button to quit or q() in the console.

Then, end the container.

# Stop Docker properly
stop_proj_docker(project_path = project_path)

There is an implementation to allow connection with a local Docker container having a mysql database setup, which is experimental. It may require an RStudio container with databases system dependencies. See parameter with_mysql.

We are also continuously improving the inside documentation about setting up {renv} inside the Docker container.
You can find it as follows:

file <- system.file("renv/renv_instructions.Rmd", package = "devindocker")
file.edit(file)

Note that it is still in development.

Develop with multiple developers with multiple OS and R versions

In your data analysis HTML pages, reports, Shiny applications projects involving multiple developers, you can easily face problems with packages versions, but also OS specificity that compromise reproducibility.
At ThinkR, each developer has a computer with a different architecture, different R version and of course different packages versions. For packages, we can use {renv}, although different R versions can be a problem. For bad adventures with encoding (UTF-8, latin, …), system dependencies, line-ending characters (LF, CR/LF), you will be tempted to avoid using Windows, MacOS and Linux together on the same project.
Hence, we decided to develop all our private projects inside identical Docker containers for a common architecture, a common R version and using {renv} for a common set of R packages. As we regularly develop Shiny applications for our clients, we also put them in production, either with RStudio Connect as we are Certified RStudio Partners or using ShinyProxy + Docker. As always, we develop with our package {golem} and in the case of production using Docker, we deliver a Dockerfile with all specificity. What could be more convenient than to develop inside the Docker container that will be delivered ?
Also, when the output of our production are HTML books or reports of data analysis, it is also convenient to develop in a fixed and common environment. It is our practice to create R packages for all types of analyses to ensure robustness and complete documentation. The reproducibility of our analysis reports can be ensured through development in a Docker environment.

Reduce problems with line-endings characters with git between OS

A little note for our beloved R packages developers and in case you do not develop inside a Docker container.
If you accept propositions of modifications from multiple developers, you can not ensure others correctly defined the line endings character parameter for their files creation.
Hence, when re-building functions documentation, you may face git showing a big set of files to be added to the commit.
This is because Linux, MacOS and Windows have different specificity for invisible line-ending character.
To reduce the risks, you can add a hidden file in your project named ".gitattributes" with this content:

# Set the default behavior, in case people don't have core.autocrlf set.
* text=auto
# Declare files that will always have CRLF line endings on checkout.
*.Rd text eol=lf
*.R text eol=lf
*.Rmd text eol=lf
*.md text eol=lf
NAMESPACE text eol=lf
DESCRIPTION text eol=lf
# Denote all files that are truly binary and should not be modified.
*.png binary
*.jpg binary

Re-create conditions of your CI locally to test success or failure of your workflow

If you never had to send a hundred of commits to make your CI work correctly, then you do not need to read this section…
Sometimes you want to reproduce the environment of your CI to be sure every tests will pass at the next commit instead of trying multiple modification and waiting again and again the output of the CI.
In this case, you can start your package inside a Docker container with no persistent packages to always use a virgin container.
However, note that you should not use the RStudio Console to realize your tests because it comes with some specific configurations that will not be available to your CI.
Instead, open the “Terminal” tab and start R inside.
Then, you can send the lines of code of your CI yml file inside.
Note that if you want to send code from a R/Rmd script, you can use Ctrl/Cmd + Alt + Enter and lines of code will be sent to the Terminal directly.
In opposition to Ctrl/Alt + Enter, which sends code in the Console.

If you use GitLab and GitLab CI, note that your container may run as root user.
Therefore, you will need to launch your Docker container with {devindocker} using the same rights with is_root = TRUE.

# Start Docker project ----
launch_proj_docker(
  project_path = project_path,
  container = container,
  port = port,
  is_root = TRUE)

Then, inside the “Terminal”, you will be able to launch sudo R directly.

After you realised your tests, you can stop the container.
Packages installed will not be kept outside the container so that you can re-open your container for later tests with different CI implementation.

Install Docker on your OS

Here is the list of resources written by Docker to install Docker on your computer depending on your OS:

Try {devindocker} and give us your feedbacks

{devindocker} is under development.
We use it daily for production.
We’ve been using it for several months now, on Linux, MacOS and Windows which makes us confident about its stability.
If you want to report bugs or propose features, you are welcome to do it on the {devindocker} GitHub repository.

This post is better presented on its original ThinkR website here: How to develop inside a Docker container to ease collaboration?

To leave a comment for the author, please follow the link and comment on their blog: Rtask.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)