Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Why call C/C++ from R?
I really like programming in R. The fact that it is open source immediately wins my favour over Matlab. It can, however, be quite slow especially if you “speak” R with a strong C/C++ accent. This sluggishness, especially when writing unavoidable for loops, has led me to consider other programming languages. I have come to the conclusion that there is no all-round best programming language but a combination of R and C++ is very hard to beat. My resolution is to write the expensive codes in C++ and to call them as functions within R so that I still have the nice dynamic interactivity of an R session with the speed and optimization of C++. There are 3 ways you can get started implementing C code in R, namely, “.C,”, “.Call” and Rcpp. It is generally not recommended to use “.C” but to use “.Call” instead. “.Call”, however, requires quite a bit of boilerplate code and can be tiresome and obfuscating to write. “Rcpp”, while not intrinsically part of R itself, is a package written by D Eddelbuettel and R François which relies on “.Call” but allows you to write neater more efficient code by providing an interface between “.Call” and the programmer. There are many things to consider when writing C++ code for R but most likely you will want to get coding as fast as possible and worry about these other tangential matters later.
How to call C/C++ from R using Rcpp [The Hard/Conscientious Way]
Note: This is the longest and hardest way to compile C++ code for R, but it is arguably the most flexible for the conscientious user who requires complex code and desires to know all the details of whats going on. For a simpler no-nonsense approach scroll down to the “sourceCpp” command.
Lets jump right in with an example
#include <Rcpp.h> #include <cstdlib> #include <iostream> using namespace std; RcppExport SEXP comp(SEXP x, SEXP y){ int i,n; Rcpp::NumericVector vector1(x); Rcpp::NumericVector vector2(y); n=vector2.size(); Rcpp::NumericVector product(n); for(i=0;i<n;i++){ product[i]=vector1[i]*vector2[i]; } return(product); }
Note: You do not need R.h and Rdefines.h when you use Rcpp.h.
Note: This is not the only way to compile and call code for R. See bottom for a neater alternative.
The function has two R objects as arguments, which are datatype “SEXP”, and returns an R object. The “SEXP” R objects, which are technically pointers to structures with typedef SEXPREC, are quite complex and so must be coerced into a vector using Rcpp::NumericVector, which makes a vector out of the SEXP object. Similarly, Rcpp::NumeriVector can be used to create a vector of length n. The rest is standard C++ code. Before compiling some compiler flags must be set:
[michael@michael rcpp]$ export PKG_CXXFLAGS=`Rscript -e "Rcpp:::CxxFlags()"` [michael@michael rcpp]$ export PKG_LIBS=`Rscript -e "Rcpp:::LdFlags()"`
Compile with:
[michael@michael rcpp]$ R CMD SHLIB comp.cpp
The function is now ready to be used in R. Consider the following R code:
library(Rcpp) dyn.load('comp.so') x=rnorm(100,0,1) y=rnorm(100,0,1) .Call('comp',x,y)
First the Rcpp library is loaded. Then, C++ code is loaded with “dyn.load”, after which it is ready to be used in R. The “.Call” function calls the “comp” function and the two arguments are supplied after. Now lets consider a more interesting example by parallelizing the C++ with openmp.
Performance with OpenMP
#include <Rcpp.h> #include <cstdlib> #include <iostream> #include <omp.h> using namespace std; RcppExport SEXP parad(SEXP x, SEXP y){ int i,n,max; Rcpp::NumericVector vector1(x); Rcpp::NumericVector vector2(y); n=vector2.size(); Rcpp::NumericVector product(n); max=omp_get_max_threads(); omp_set_num_threads(max); #pragma omp parallel for for(i=0;i<n;i++){ product[i]=vector1[i]/vector2[i]; } return(product); }
The purpose of this code is rather trivial, just dividing each element of x by the corresponding element of y. Now the compiler must be told to link against the openmp libraries so to compile we write
[michael@michael rcpp]$ export PKG_LIBS='`Rscript -e "Rcpp:::LdFlags()"` -fopenmp -lgomp' [michael@michael rcpp]$ export PKG_CXXFLAGS='`Rscript -e "Rcpp:::CxxFlags()"` -fopenmp' [michael@michael rcpp]$ R CMD SHLIB parad.cpp
How does this fair against the standard R “x/y”? To compare the “rbenchmark” package will be used.
library(Rcpp) library(rbenchmark) dyn.load('unpar.so') dyn.load('parad.so') x=rnorm(10000000,0,1) y=rnorm(10000000,0,1) benchmark(replications=rep(1,0,1),.Call('unpar',x,y),.Call('parad',x,y),x/y) > benchmark(replications=rep(1,0,1),.Call('unpar',x,y),.Call('parad',x,y),x/y) test replications elapsed relative user.self sys.self 2 .Call("parad", x, y) 1 0.036 1.000 0.112 0.002 1 .Call("unpar", x, y) 1 0.082 2.278 0.081 0.000 3 x/y 1 0.053 1.472 0.048 0.005
The parallel version of the x-divided-by-y code is even faster than R’s native “x/y”.
Converting between C++ and R datatypes using “as” and “wrap”
If you feel more comfortable seeing familiar C/C++ variable types, one is free to convert all of the function arguments of SEXP type to the desired C++ variable types for use in the body of the code and then to convert them back to SEXP at the end before returning the function. Say, for example, that I have a vector of 100 standard normal draws as one might obtain from x=rnorm(100,0,1). Passing x as an argument to the C++ function and the variable type of x is SEXP. I can then convert it to a vector of doubles using the “as” command, perform any processing with it in the body of the function and then convert it back to SEXP at the end using the “wrap” command. Here is a short example:
#include <Rcpp.h> #include <cstdlib> #include <iostream> #include <vector> using namespace std; RcppExport SEXP conv(SEXP x, SEXP y){ int i,n; vector<double> vector1=Rcpp::as<vector<double> >(x); vector<double> vector2=Rcpp::as<vector<double> >(y); n=vector1.size(); vector<double> product(n); for(i=0;i<n;i++){ product[i]=vector1[i]/vector2[i]; } return( Rcpp::wrap(product) ); }
The x and y SEXP’s are converted to vector doubles and then converted back at the end. I like this approach because my function remains looking like C++.
Streamlined sourceCpp() and cppFunction()
It is not necessary to compile, link and load the C++ code as above. There exist some wrappers for all of these tasks such as include, sourceCpp and cppFunction. I like sourceCpp() the best. Consider the C++ code stored in parad.cpp shown below:
#include <cstdlib> #include <iostream> #include <Rcpp.h> #include <omp.h> using namespace std; // [[Rcpp::export]] Rcpp::NumericVector parad(Rcpp::NumericVector x, Rcpp::NumericVector y){ int i,n,max; n=x.size(); Rcpp::NumericVector product(n); max=omp_get_max_threads(); omp_set_num_threads(max); #pragma omp parallel for for(i=0;i<n;i++){ product[i]=x[i]/y[i]; } return(product); }
This is a neat little parallel C++ function. This can be compiled, linked and loaded in one step using the sourceCpp() function. All we need to do is set some terminal environment variables for the compiler to use openMP.
library(Rcpp) Sys.setenv("PKG_CXXFLAGS"="-fopenmp") Sys.setenv("PKG_LIBS"="-fopenmp") sourceCpp("parad.cpp") a=rnorm(1000,0,1) b=rnorm(1000,0,1) c=parad(a,b)
Lines 2 and 3 are simply setting terminal environment variables from within R so that the compiler compiles as “g++ …. -fopenmp …” which we need for the “omp.h” header. sourceCpp is a wrapper that takes care of everything. After that one can immediately start using the parad() function.
The post Calling C++ from R using Rcpp appeared first on Lindons Log.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.