Avoiding unnecessary memory allocations in R
[This article was first published on bioCS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As a rule, everything I discover in R has already been discussed by Hadley Wickham. In this case, he writes:Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The reason why the C++ function is faster is subtle, and relates to memory management. The R version needs to create an intermediate vector the same length as y (x – ys), and allocating memory is an expensive operation. The C++ function avoids this overhead because it uses an intermediate scalar.In my case, I want to count the number of items in a vector below a certain threshold. R will allocate a new vector for the result of the comparison, and then sum over that vector. It’s possible to speed that up about ten-fold by directly counting in C++:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(Rcpp) | |
`%count<%` <- cppFunction(' | |
size_t count_less(NumericVector x, NumericVector y) { | |
const size_t nx = x.size(); | |
const size_t ny = y.size(); | |
if (nx > 1 & ny > 1) stop("Only one parameter can be a vector!"); | |
size_t count = 0; | |
if (nx == 1) { | |
double c = x[0]; | |
for (int i = 0; i < ny; i++) count += c < y[i]; | |
} else { | |
double c = y[0]; | |
for (int i = 0; i < nx; i++) count += x[i] < c; | |
} | |
return count; | |
} | |
') | |
set.seed(42) | |
N <- 100000000 | |
v <- runif(N, 0, 10000) | |
system.time(sum(v < 5000)) | |
system.time(v %count<% 5000) | |
Often this won’t be the bottleneck, but may be useful at some point.
To leave a comment for the author, please follow the link and comment on their blog: bioCS.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.