Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When processing large data sets in R you often also end up creating large temporary objects. In order to keep the memory footprint small, it is always good to remove those temporary objects as soon as possible. When done, removed objects will be deallocated from memory (RAM) the next time the garbage collection runs.
Better: Use rm(list="x")
instead of rm(x)
, if using rm()
To remove an object in R, one can use the rm()
function (with alias remove()
). However, it turns out that that function has quite a bit of internal overhead (look at its R code), particularly if you call it as rm(x)
rather than rm(list="x")
. The former takes about three times longer to complete. Example:
> t1 <- system.time(for (k in 1:1e5) { a <- 1; rm(a); }) > t2 <- system.time(for (k in 1:1e5) { a <- 1; rm(list="a"); }) > t1 user system elapsed 10.45 0.00 10.50 > t2 user system elapsed 2.93 0.00 2.94 > t1/t2 user system elapsed 3.566553 NaN 3.571429
Note: In order to minimize the impact of the memory allocation on the benchmark, I use 'a <- 1'
to represent the “large” object.
Best: Use x <- NULL instead of rm()
Instead of using rm(list="x")
, which still has a fair amount of overhead, one can remove a large active object by assigning the corresponding variable a new value (a small object), e.g. x <- NULL
. Whenever doing this, the previously assigned value (the large object) will become available for garbage collection. Example:
> t3 <- system.time(for (k in 1:1e5) { a <- 1; a <- NULL; }) > t3 user system elapsed 0.05 0.00 0.05 > t1/t3 user system elapsed 209 NaN 210
That's a 200 times speedup!
Background
I “accidentally” discovered this when profiling readMat()
in my R.matlab package. In particular, there was one rm(x) call inside a local function that was called thousands of times when parsing modestly large MAT files. Together with some additional optimizations, R.matlab v2.0.0 (to be appear) is now 10-20 times faster. Now I'm going to review all my other packages for expensive rm()
calls.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.