An efficient way to install and load R packages
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
What is a R package and how to use it?
Unlike other programs, only fundamental functionalities come by default with R. You will thus often need to install some “extensions” to perform the analyses you want. These extensions which are are collections of functions and datasets developed and published by R users are called packages. They extend existing base R functionalities by adding new ones. R is open source so everyone can write code and publish it as a package, and everyone can install a package and start using the functions or datasets built inside the package, all this for free.
In order to use a package, it needs to be installed on your computer by running install.packages("name_of_package")
(do not forget ""
around the name of the package, otherwise R will look for an object saved under that name!). Once the package is installed, you must load the package and only after it has been loaded you can use all the functions and datasets it contains. To load a package, run library(name_of_package)
(this time ""
around the name of the package are optional, but can still be used if you wish).
Inefficient way to install and load R packages
Depending on how long you have been using R, you may use a limited amount of packages or, on the contrary, a large amount of them. As you use more and more packages you will soon start to have (too) many lines of code just for installing and loading them.
Here is a preview of the code from my PhD thesis showing how the installation and loading of R packages looked like when I started working on R (only a fraction of them are displayed to shorten the code):
# Installation of required packages install.packages("tidyverse") install.packages("ggplot2") install.packages("readxl") install.packages("dplyr") install.packages("tidyr") install.packages("ggfortify") install.packages("DT") install.packages("reshape2") install.packages("knitr") install.packages("lubridate") # Load packages library("tidyverse") library("ggplot2") library("readxl") library("dplyr") library("tidyr") library("ggfortify") library("DT") library("reshape2") library("knitr") library("lubridate")
As you can guess the code became longer and longer as I needed more and more packages for my analyses. Moreover, I tended to reinstall all packages as I was working on 4 different computers and I could not remember which packages were already installed on which machine. Reinstalling all packages every time I opened my script or R Markdown document was a waste of time.
More efficient way
Then one day, a colleague of mine shared some of his code with me. I am glad he did as he introduced me to a much more efficient way to install and load R packages. He gave me the permission to share the tip, so here is the code I now use to perform the task of installing and loading R packages:
# Package names packages <- c("ggplot2", "readxl", "dplyr", "tidyr", "ggfortify", "DT", "reshape2", "knitr", "lubridate", "pwr", "psy", "car", "doBy", "imputeMissings", "RcmdrMisc", "questionr", "vcd", "multcomp", "KappaGUI", "rcompanion", "FactoMineR", "factoextra", "corrplot", "ltm", "goeveg", "corrplot", "FSA", "MASS", "scales", "nlme", "psych", "ordinal", "lmtest", "ggpubr", "dslabs", "stringr", "assist", "ggstatsplot", "forcats", "styler", "remedy", "snakecaser", "addinslist", "esquisse", "here", "summarytools", "magrittr", "tidyverse", "funModeling", "pander", "cluster", "abind") # Install packages not yet installed installed_packages <- packages %in% rownames(installed.packages()) if (any(installed_packages == FALSE)) { install.packages(packages[!installed_packages]) } # Packages loading invisible(lapply(packages, library, character.only = TRUE))
This code for installing and loading R packages is more efficient in several ways:
- The function
install.packages()
accepts a vector as argument, so one line of code for each package in the past is now one line including all packages - In the second part of the code, it checks whether a package is already installed or not, and then install only the missing ones
- Regarding the packages loading (the last part of the code), the
lapply()
function is used to call thelibrary()
function on all packages at once, which makes the code more condense. - The output when loading a package is rarely useful. The
invisible()
function removes this output.
From that day on, every time I need to use a new package, I simply add it to the vector packages
at the top of the code, which is located at the top of my scripts and R Markdown documents. No matter on which computer I am working on, running the entire code will install only the missing packages and will load all of them. This greatly reduced the running time for the installation and loading of my R packages.
Most efficient way
{pacman}
package
After this article was published, a reader informed me about the {packman}
package. After having read the documentation and try it out myself, I learned that the function p_load()
from {pacman}
checks to see if a package is installed, if not it attempts to install the package and then loads it. It can also be applied to several packages at once, all this in a very condensed way:
install.packages("pacman") pacman::p_load(ggplot2, tidyr, dplyr)
Find more about this package on CRAN.
{librarian}
package
Like {pacman}
, the shelf()
function from the {librarian}
package automatically installs, updates, and loads R packages that are not yet installed in a single function. The function accepts packages from CRAN, GitHub, and Bioconductor (only if Bioconductor’s Biobase
package is installed). The function also accepts multiple package entries, provided as a comma-separated list of unquoted names (so no ""
around package names).
Last but not least, the {librarian}
package allows to load packages automatically at the start of every R session (thanks to the lib_startup()
function) and search for new packages on CRAN by keywords or regular expressions (thanks to the browse_cran()
function).
Here is an example of how to install missing packages and load them with the shelf()
function:
# From CRAN: install.packages("librarian") librarian::shelf(ggplot2, DesiQuintans / desiderata, pander)
For CRAN packages, provide the package name as normal without ""
and for GitHub packages, provide the username and package name separated by /
(i.e., UserName/RepoName
as shown for the desiderata
package).
Find more about this package on CRAN.
Thanks for reading. I hope the article helped you to install and load R packages in a more efficient way.
As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion.
A special thanks Danilo and James for informing me about the {pacman}
and {librarian}
packages.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.