Site icon R-bloggers

My experiences with Rcpp

[This article was first published on Clustering epigenetic data using a Dirichlet process prior » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The last seven days till Tuesday I have been working on the conversion of the code of my master thesis from scripted R (statistics) to compiled C++ using the Rcpp package from Dirk Eddelbuettel. Despite the initial effort necessary to set up the system (especially under Windows), I was looking forward to a huge speed-up of my simulation.

Setting up Rcpp

Setting up Rcpp under Windows is more or less straight-forward – there are just many small things you should take care of and it took me some time to figure out all of them. A good starting point is the Rcpp-FAQ that gives information on which software you’ll need. Luckily Duncan Murdoch provides the RTools package which puts together nearly everything you need. Due to license or size limitations, there are some tools still missing which you will need to install additionally. They are described in Appendix D – The Windows toolset of the R Installation and Administration manual. Be careful not to have spaces in the paths of the components. I especially want to emphasize that the German version of Windows 7 shows the “Program Files” folder as “Programme” which looks as though it doesn’t have a space. If you click on the address bar of the explorer though, you will see that “Programme” is just a link to the “Program Files” folder which actually has a space and therefore installing R there will not work (or at least didn’t work for me).

Converting R-code to C++

Converting my code from R to C++ was easier than I first thought. Using the inline method from Rcpp you can directly include C++ code as a string in R, have it compiled into a function and call it from R. The compiler error messages will be forwarded to R and displayed there which helps a lot debugging your code. Just some of the error messages are apparently not forwarded, for example if you try to access an std::vector-element with an index out of range, R will simply crash without any warning. Converting my simulation, this was the only error I found which R was not able to communicate with me.

Using Visual Studio with Rcpp

Even though Dirk Eddelbuettel and Romain François answer the question “Can I use Rcpp with Visual Studio” straightforwardly with “Not a chance”, I was using Visual Studio quite extensively for my development. It is true, that you won’t be able to compile your code with Rcpp, that is what you still need the toolchain from RTools for. But that doesn’t keep you from using Visual Studio for development. My solution looks as follows: I use a file dppClustering.cpp which I load and compile from R with the include function. In this file all variables are converted from Rcpp-variables into C++ variables. With these I then call my simulation-class that does contains the logic.

To develop with VS, instead of using dppClustering.cpp I created a new project that includes the simulation classes and accesses their functionality. With this set-up I am able to use the complete power of Visual Studio for my development, but I can still compile from within R using Rcpp.

How about the speed-up?

The runtime difference between R and C++ code is just mind-blowing. I averaged the runtime of 40 C++ runs and 2 R runs and calculated a speed-up of over 100.

The combination of fast implementation with R and additional runtime improvements using C++ with Rcpp for the computationally intensive parts of the code makes Rcpp an enormously powerful tool – the week I invested really payed off.


To leave a comment for the author, please follow the link and comment on their blog: Clustering epigenetic data using a Dirichlet process prior » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.