user2013: The Rcpp tutorial
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m at user 2013, and this morning I attended Hadley Wickham and Romain Francois’s tutorial on the Rcpp package for calling C++ code from R. I’ve spent the last eight years avoiding C++ afer having nightmares about obscure pointer bugs, so I went into the room slightly skeptical about this package.
I think the most important takeaway from the tutorial was a clear sense of when and why you might want to use C++.
The main selling point for using C++ with R is, in Hadley’s words, that R is optimised for making programmers efficient whereas C++ is made for making machine efficient, so the langauages are complimentary. That is, most of the time the slow part of doing statistics is you. Occasionally however, the slow part will be running your code, and in those instances C++ is better than R.
In order to write fast R code, it needs to be vectorised, and that often means using different functions to a scalar version. A classic example is using ifelse
instead of separate if
and else
blocks, or using pmax
instead of max
.
Knowing how to vectorise R code thus requires quite a large vocabulary of functions. In C++ there is no vectorisation – you just write a for
loop.
There are three things in particular that C++ does much faster than R: the above mentioned looping, resizing vectors and calling functions. (For the last point, Hadley quoted an overhead of 2ns to call a function in C++ versus 200ns in R.)
This means that C++ is useful for the following restricted use cases:
- When vectorisation is difficult or impossible. This is common when one element of a vector depends upon previous elements. MCMC is a classic example.
- When you are changing the size of a vector in a loop. Run length encoding was the example given.
- When you need to make millions of function calls. Recursive functions and some optimisation and simulation problems fit this category.
Typically C++ can give you an order of magnitude or two speed up over an R equivalent, but this is wildly problem-dependent and many of the built-in functions call C code which will run at the same speed (more or less) as a C++ version. It’s also important to consider how often the code will be run. Even if you have a thousand-fold speedup, if the running time of the R function is 0.01s, then you need to run it 60000 times just to get back the 10 minutes it took you to rewrite it in C++.
Anyway, using Rcpp makes it surprisingly simple to call C++ code. You need to install Rtools under windows, and of course the Rcpp package.
install.packages(c("installr", "Rcpp")) library(installr) install.Rtools()
Check that Rcpp is working by seeing if the following expression returns 2.
library(Rcpp) evalCpp("1 + 1")
Then you can create a C++ function using the cppFunction
function. Here’s a reimplementation of the any
function. (Although it doesn’t deal with missing values.)
cppFunction(' bool Any(LogicalVector x) { for(int i = 0; i < x.size(); ++i) { if(x[i]) { return true; } } return false; } ')
Notice that in C++ you must be explicit about the types of variable that are passed into and returned from a function. How to wrie C++ is beyond the scope of the post, so I’ll say no more.
You can now call the Any
function like this.
Any(runif(10) > 0.5) #returns TRUE Any(runif(10) > 1.5) #returns FALSE
Tagged: c++, r, rcpp, user2013
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.