Accelerating path-dependent loops: A quick Rcpp case study
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
By the time I saw that question yesterday evening, Josh Ulrich had already posted a nice answer suggesting
to switch from ifelse
to if
to escape the overhead of a vectorised expression on simple scalars. I meekly added a comment
suggesting that Rcpp would likely do well on this and that someone should volunteer
such an answer. Well, it is still August and Mr. Someone is on holiday, so this morning I followed up with such an answer. And as it turns out to
work quite well indeed, we will repost it here.
Let’s start with the general setup, and the two functions supplied by Josh. We also byte-compile these using the compiler
package which
is the culmination of several years of work by R core member Luke Tierney. This package was added to R
during the last major release, and we assessed it
in this earlier blog post as well as several Rcpp-related
follow-ups. We also load the packages for on-the-fly ‘inline’ compilation and easy benchmarking:
library(inline) library(rbenchmark) library(compiler) fun1 <- function(z) { for(i in 2:NROW(z)) { z[i] <- ifelse(z[i-1]==1, 1, 0) } z } fun1c <- cmpfun(fun1) fun2 <- function(z) { for(i in 2:NROW(z)) { z[i] <- if(z[i-1]==1) 1 else 0 } z } fun2c <- cmpfun(fun2)
We see that basic worker just assign to the i-th element based on the preceding element. Function two uses the aforementioned
if
statement, and both are also byte-compiled.
Writing the same code in C++ using both Rcpp (for the R/C++ integration) and inline for the on-the-fly compilation, linking and loading of C++ code into R is pretty straightforward too:
funRcpp <- cxxfunction(signature(zs="numeric"), plugin="Rcpp", body=" Rcpp::NumericVector z = Rcpp::NumericVector(zs); int n = z.size(); for (int i=1; i<n; i++) { z[i] = (z[i-1]==1.0 ? 1.0 : 0.0); } return(z); ")
This single R function call takes the code embedded in the argument to the body
variable, builds a complete function in temporary file,
and then compiles, links and loads it. We can now call funRcpp()
just like the other four functions and do so via the
benchmark()
function of the rbenchmark package.
R> z <- rep(c(1,1,0,0,0,0), 100) R> identical(fun1(z),fun2(z),fun1c(z),fun2c(z),funRcpp(z)) [1] TRUE R> R> res <- benchmark(fun1(z), fun2(z), + fun1c(z), fun2c(z), + funRcpp(z), + columns=c("test", "replications", "elapsed", "relative", "user.self", "sys.self"), + order="relative", + replications=1000) R> print(res) test replications elapsed relative user.self sys.self 5 funRcpp(z) 1000 0.005 1.0 0.01 0 4 fun2c(z) 1000 0.482 96.4 0.48 0 2 fun2(z) 1000 1.989 397.8 1.98 0 3 fun1c(z) 1000 11.365 2273.0 11.37 0 1 fun1(z) 1000 13.210 2642.0 13.21 0
We can focus on the columns labelled elapsed and relative. The C++ version is faster by a factor of almost one-hundred compared to the byte-compiled version of funtion2, and almost four-hundred times faster than the plain-R variant of function2. And function1 is even worse, coming at well over twenty-two-hundred times the run-time of the C++ version. Byte-compilation also helps little here.
For comparison, we also ran the original example of a very short vector, called more frequently:
The qualitative ranking is unchanged: the Rcpp version dominates. Function2 usingR> z <- c(1,1,0,0,0,0) R> res2 <- benchmark(fun1(z), fun2(z), + fun1c(z), fun2c(z), + funRcpp(z), + columns=c("test", "replications", "elapsed", "relative", "user.self", "sys.self"), + order="relative", + replications=10000) R> print(res2) test replications elapsed relative user.self sys.self 5 funRcpp(z) 10000 0.047 1.00000 0.04 0 4 fun2c(z) 10000 0.134 2.85106 0.13 0 2 fun2(z) 10000 0.328 6.97872 0.32 0 3 fun1c(z) 10000 1.080 22.97872 1.08 0 1 fun1(z) 10000 1.243 26.44681 1.24 0
if
instead of the vectorised ifelse
is second-best with the byte-compiled version being about twice as fast that the plain R
variant, but still almost three times slower than the C++ version. And the relative differences are less pronounced: relatively speaking, the
function call overhead matters less here and the actual looping matters more: C++ gets a bigger advantage on the actual loop operations in the longer
vectors. That it is an important result as it suggests that on more real-life sized data, the compiled version may reap a larger benefit.
All in all a nice demonstration of Rcpp, and a gain of almost one-hundred to the best byte-compiled version is nothing to sneeze at---especially when it is so easy to write and load a five-line C++ function thanks to Rcpp.
Before closing, a quick reminder that I will giving two classes on Rcpp in a few weeks. These will be in New York on September 24, and San Franciso on October 8, see this blog post as well as this page at Revolution Analytics (who are a co-organiser of the classes) for details and registration information.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.