Site icon R-bloggers

Speed trick: Assigning large object NULL is much faster than using rm()!

[This article was first published on jottR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When processing large data sets in R you often also end up creating large temporary objects. In order to keep the memory footprint small, it is always good to remove those temporary objects as soon as possible. When done, removed objects will be deallocated from memory (RAM) the next time the garbage collection runs.

Better: Use rm(list="x") instead of rm(x), if using rm()

To remove an object in R, one can use the rm() function (with alias remove()). However, it turns out that that function has quite a bit of internal overhead (look at its R code), particularly if you call it as rm(x) rather than rm(list="x"). The former takes about three times longer to complete. Example:

> t1 <- system.time(for (k in 1:1e5) { a <- 1; rm(a); })
> t2 <- system.time(for (k in 1:1e5) { a <- 1; rm(list="a"); })
> t1
   user  system elapsed
  10.45    0.00   10.50
> t2
   user  system elapsed
   2.93    0.00    2.94
> t1/t2
    user   system  elapsed
3.566553      NaN 3.571429

Note: In order to minimize the impact of the memory allocation on the benchmark, I use 'a <- 1' to represent the “large” object.

Best: Use x <- NULL instead of rm()

Instead of using rm(list="x"), which still has a fair amount of overhead, one can remove a large active object by assigning the corresponding variable a new value (a small object), e.g. x <- NULL. Whenever doing this, the previously assigned value (the large object) will become available for garbage collection. Example:

> t3 <- system.time(for (k in 1:1e5) { a <- 1; a <- NULL; })
> t3
   user  system elapsed
   0.05    0.00    0.05
> t1/t3
   user  system elapsed
    209     NaN     210

That's a 200 times speedup!

Background

I “accidentally” discovered this when profiling readMat() in my R.matlab package. In particular, there was one rm(x) call inside a local function that was called thousands of times when parsing modestly large MAT files. Together with some additional optimizations, R.matlab v2.0.0 (to be appear) is now 10-20 times faster. Now I'm going to review all my other packages for expensive rm() calls.

To leave a comment for the author, please follow the link and comment on their blog: jottR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.