Using cpp11 (R package) and llvm on Ubuntu
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R/Shiny Training: Should you find this blog to be of interest, kindly note that I offer personalized and group-based training sessions that may be reserved through Buy me a Coffee. Additionally, I provide training services in the Spanish language and am available to discuss means by which I may contribute to your Shiny project.
Motivation
A large part of my research interest requires to estimate computationally intensive models, such as the General Equilibrium Poisson Pseudo Maximum Likelihood (GEPPML) estimator derived from the equilibrium conditions introduced by Anderson and Van Wincoop (2004) for estimation and inference.
The GEPPML estimator is a computationally intensive estimator that requires to solve a system of non-linear equations, and for this task we might be better-off by using a compiled language such as C++. The good news is that we can use C++ code within R and Python, and this is what this blog is about.
Also, I do not pretend to be an expert on C++ or debate if R is better than Python. I use both from Visual Studio Code. I do want to share my experience on how to use C++ code within R.
Honest disclaimer
This blog post is a summary of what worked after hours of fails for my future self. I hope it helps you too.
I am a Statistician and Political Scientist, not a Computer Scientist!
Setup
Because I have been already learning C++ version 11, I decided to install llvm-11
on my laptop
that has Linux Mint installed, and which is based on Ubuntu 22.04.
Ubuntu and its derived distributions use gcc
as the default compiler, and clang
is not installed by default. Different
resources mention that clang
provides more informative error messages when the compilation fails and when we debug code.
Counting on informative error messages is highly useful resource when we are learning C++ or when our code is failing in two different ways, being one that it does not compile, and the other that it compiles but then when we call a function from RStudio (or VSCode) it crashes the R session.
We need to install the R packages cpp11
and usethis
.
install.packages(c("cpp11", "usethis"))
To install llvm-11
I downloaded the installation script from the official LLVM repository, and it also installed clang-11
.
cd Downloads wget https://apt.llvm.org/llvm.sh chmod +x llvm.sh sudo ./llvm.sh 11
In order to avoid errors of the form fatal error: 'cstdio' file not found
when we compile C++ code,
we need to install additional packages. This took me a few hours searching on the Internet until I figured it out.
sudo apt install g++-11 libc++-11-dev libc++abi-11-dev
To be sure that devtools::install()
uses the correct version of clang++
I created the ~/.R/Makevars
file,
by running mkdir ~/.R && nano ~/.R/Makevars
from bash. The contents of the file are the following.
CLANGVER=-11 CLANGLIB=-stdlib=libc++ CXX=$(CCACHE) clang++$(CLANGVER) $(CLANGLIB) CXX11=$(CCACHE) clang++$(CLANGVER) $(CLANGLIB) CC=$(CCACHE) clang$(CLANGVER) SHLIB_CXXLD=clang++$(CLANGVER) $(CLANGLIB) CXXFLAGS=-Wall -O0 -pedantic CXX11FLAGS=-Wall -O0 -pedantic
Note that for both CXXFLAGS
and CXX11FLAGS
I am using -O0
to avoid optimization, which is useful for debugging. After the code is working, we can change it to -O3
to optimize the compiled code.
If later on we need to compile with gcc
again, we can open the file and comment all the lines.
If you close RStudio (or VSCode) and open it again, you can check that the changes were implemented by running pkgbuild::check_build_tools(debug = TRUE)
, which should return the following output.
Trying to compile a simple C file Running /usr/lib/R/bin/R CMD SHLIB foo.c using C compiler: ‘Ubuntu clang version 11.1.0-6’ clang-11 -I"/usr/share/R/include" -DNDEBUG -fpic -g -O2 -ffile-prefix-map=/build/r-base-JhpCKt/r-base-4.3.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c foo.c -o foo.o clang-11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -o foo.so foo.o -L/usr/lib/R/lib -lR
Creating a dummy package
From RStudio (or VSCode) we can create a new package by running usethis::create_package("~/cpp11dummypackage")
. This will create a new folder with the name cpp11dummypackage
.
Then run usethis::use_cpp11()
to add the required files to use C++ code within R.
At this point you are almost ready to start working and be productive, but there is one more thing to do.
Run usethis::use_r("cpp11dummypackage-package")
to create a new R script file with the name cpp11dummypackage-package.R
within the R
folder, and add the following code to it.
#' @useDynLib cpp11dummypackage, .registration = TRUE NULL
The usethis
skeleton also created the file src/code.cpp
for us. We can add a simple function to transpose a matrix to it, by replacing the file contents by the following lines.
#include <cpp11.hpp> #include <cpp11/doubles.hpp> using namespace cpp11; [[cpp11::register]] doubles_matrix<> Xt(doubles_matrix<> X) { int NX = X.nrow(); int MX = X.ncol(); writable::doubles_matrix<> R(MX, NX); for (int i = 0; i < MX; i++) { for (int j = 0; j < NX; j++) { R(i, j) = X(j, i); } } return R; }
In order to export the function, we need to add the following lines to cpp11dummypackage-package.R
.
#' Transpose a matrix #' @export #' @rdname Xt #' @param X numeric matrix #' @return numeric matrix #' @examples #' set.seed(1234) #' X <- matrix(rnorm(4), nrow = 2, ncol = 2) #' X #' cpp11_Xt(X) cpp11_Xt <- function(X) { Xt(X) }
With cpp11::cpp11_register()
and devtools::load_all()
we can test our function.
> set.seed(1234) > X <- matrix(rnorm(4), nrow = 2, ncol = 2) > X [,1] [,2] [1,] -1.2070657 1.084441 [2,] 0.2774292 -2.345698 > cpp11_Xt(X) [,1] [,2] [1,] -1.207066 0.2774292 [2,] 1.084441 -2.3456977
If we would have passed 1:4
instead of rnorm(4)
to matrix()
, we would have obtained the following error message.
> cpp11_Xt(X) Error: Invalid input type, expected 'double' actual 'integer'
This is because we declared the function to accept a doubles_matrix<>
as input, and not an integers_matrix<>
.
To install the recently created package, run the following lines in the R console.
devtools::clean_dll() cpp11::cpp_register() devtools::document() devtools::install()
Debugging the package
In order to access debugging symbols, we need to create a new Makevars
file within the src
folder, and adding the following lines.
CXX_STD = CXX11 PKG_CPPFLAGS = -UDEBUG -g
Then we need to reinstall our package, and in bash we can run R -d lldb-11
and follow this excellent guide to debug R and C++ code.
A more complex example
I created a package containing a set of simple functions, including the Gauss-Jordan method to invert a matrix, that allows the user to obtain the Ordinary Least Squares (OLS) estimator by calling a C++ function that calls other C++ functions. This implementation is extremely naive, but it is enough to show how to use C++ code within R. Please see it from my GitHub profile.
A good challenge would be to implement the QR decomposition used by the lm()
function in R and use it to obtain the OLS estimator in C++.
This would require some effort, but here you can find a good starting point.
In any case, it would be extremely hard to beat the performance of the lm()
function in R, which has some internals written in C,
and how computationally robust lm()
is means another feature that is hard to beat.
Bonus
Maybe create a file ~/.Rprofile
containing the following lines.
library(devtools) library(usethis) library(cpp11)
Then forget about devtools::
, cpp11::
and usethis::
and use clean_dll()
, cpp_register()
, document()
, install()
, create_package()
, use_cpp11()
and use_r()
from now on every time you open RStudio (or VSCode).
References
- Debugging in R with a single command
- Debugging an R package with C++
- Clang++ missing C++ header?
- How to I tell RStudio not to ignore the indication to use clang in Makevars?
- R’s Makevars: PKG_CXXFLAGS vs. PKG_CXX11FLAGS
- Debugging memory errors with valgrind and gdb
- A Deep Dive Into How R Fits a Linear Model
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.