Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data scientists prefer using the latest R packages to analyze their data. To ensure a good user experience, you will need a recent version of R running on a modern operating system. If you run R on an production server – and especially if you use RStudio Connect – plan to support multiple versions of R side by side so that your code, reports, and apps remain stable over time. You can support multiple versions of R concurrently by building R from source. Plan to install a new version of R at least once per year on your servers.
A solid foundation for R
Administering R on the desktop is relatively easy, because desktops are designed for a single user at a specific time. Desktop users upgrade R versions and R packages as new software becomes available, leaving old versions and packages behind. Servers, on the other hand, are designed to support multiple people who want to access content across time. Servers are increasingly used for building data science labs in R, deploying R in production, and running R in the cloud. You may find that the same strategies you use to administer R on your desktop do not work as well on a server. In particular, upgrading your version of R must be handled differently.
If you upgrade R on your server as you do on your desktop, you could easily break some apps and disrupt your teams. Administrators should exercise caution when upgrading to a new version of R on a Linux server. Consider the following situations:
- You are hosting apps on RStudio Connect and Shiny Server for more than a year. When you upgrade R, you break many of your older apps.
- Your team is developing code on a shared instance of RStudio Server. When you upgrade R, you disrupt people’s work and break their code.
Instead of upgrading your existing version of R, a better solution to these problems is to run multiple versions of R side by side. This strategy preserves past versions of R so you can manage upgrades and keep your code, apps, and reports stable over time.
Building R from source
The best way to run multiple versions of R side by side is to build R from source. If you are running R on a Linux server – and particularly in the enterprise – you should always build R from source, because it will help you:
- Run multiple versions of R side by side
- Guarantee that R will work on your unique server configuration
- Potentially speed up certain low-level computations used by R
- Build technical expertise that will help you administer R at scale
Most enterprise IT departments will be comfortable building software from source. If you have never built R from source, it is very straightforward. First, you need the build dependencies for R. If you’ve already installed R from a binary source like CRAN or EPEL, you may already have these dependencies installed; otherwise, you can run sudo yum-builddep R
on RedHat or sudo apt-get build-dep r-base
on Ubuntu. Second, you should obtain and unpack the source tarball for the version of R you want to install from CRAN. Third, from within the extracted source directory, build R from source using configure
, make
, and make install
commands. For example:
# BUILD R FROM SOURCE ON REDHAT LINUX # R-3.4.3 # Install Linux dependencies $ sudo yum-builddep R # Download and extract source code $ wget https://cran.r-project.org/src/base/R-3/R-3.4.3.tar.gz $ tar -xzvf R-3.4.3.tar.gz $ cd R-3.4.3 # Build R from source $ ./configure --prefix=/opt/R/$(cat VERSION) --enable-R-shlib --with-blas --with-lapack $ make $ sudo make install
This script installs R version 3.4.3 into /opt/R/3.4.3
, but you can install R into any of the recommended directories. The --enable-R-shlib
option is required to make the shared libraries known to RStudio. The --with-blas
and --with-lapack
options are not required, but are commonly included. These options install the system BLAS and LAPACK libraries, which are used to speed up certain low-level math computations (e.g., multiplying and inverting matrices). These libraries will not speed up R itself, but can significantly speed up the underlying code execution.
If you run into problems installing R from source, you can always remove the installation directory and start over. However, once the installation succeeds, you should never move the installation directory – in other words, always install into the final destination directory. If you run into problems with dependencies, make sure you are able to identify and install all of the required Linux libraries (e.g., the X11 library is commonly overlooked). Building R from source will be much easier with a modern operating system that is connected to the Internet.
For further details about building R from source, see the RStudio Server Admin Guide.
RStudio professional products
RStudio professional products automatically support multiple versions of R and provide additional features, such as having administrators control access to multiple versions, or allowing users to choose for themselves. RStudio Connect automatically provides R version matching. Running multiple versions of R side by side with RStudio Connect will ensure that your content persists over time.
References
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.