Growing Objects and Loop Memory Pre-Allocation
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Preallocating Memory
This will be a short post about a simple, but very important concept that can drastically increase the speed of poorly written codes. It is very common to see R loops written as follows:
v = NULL n = 1e5 for(i in 1:n) v = c(v, i)
This seems like a natural way to write such a task: at each iteration, we increase our vector v to add one more element to it.
However, we will see that if we simply initialize a vector (with NAs, zeros or any other value) with the total length and then run the loop, we can drastically increase the speed of our algorithm. To illustrate this point, we will use the microbenchmark package. First, let’s put both approaches in functions:
f_slow = function(n) { v = NULL for(i in 1:n) v = c(v, i) return(v) } f_fast = function(n) { v = rep(NA, n) for(i in 1:n) v[i] = i return(v) }
Now to compare the speeds we can start with a small value of n, say 100.
library(microbenchmark) library(ggplot2) n = 100 mbm = microbenchmark("f_slow" = f_slow(n), "f_fast" = f_fast(n), times = 50 ) print(mbm) ## Unit: microseconds ## expr min lq mean median uq max neval ## f_slow 24.060 24.576 64.690 25.156 26.892 1958.947 50 ## f_fast 5.098 5.419 43.235 5.562 5.790 1882.557 50
We see that even for such a simple loop, preallocating the vector already makes a big difference in the runtime. For bigger values of n, such difference becomes much more relevant:
n = 1e5 mbm = microbenchmark("f_slow" = f_slow(n), "f_fast" = f_fast(n), times = 50 ) print(mbm) ## Unit: milliseconds ## expr min lq mean median uq max neval ## f_slow 8615.797 8697.359 8736.413 8728.380 8776.894 8879.216 50 ## f_fast 4.683 4.968 5.257 5.100 5.196 7.098 50
Fully understanding why this happens would require understanding how dynamic allocation of memory works, which is not our focus here, but the basic idea is that when the object grows inside a loop, the computer must repeatedly find more space available for it in the memory, which is a costly process that can lead to memory fragmentation. For illustration purposes, let’s plot the object size (in bytes) as a function of i for our two loops. Note how, in the fast case, the object doesn’t increase in size over time.
library(pryr) n = 1e5 object_size_slow_loop = rep(NA, n) object_size_fast_loop = rep(NA, n) v1 = NULL for(i in 1:n) { v1 = c(v1, i) object_size_slow_loop[i] = object_size(v1) } v2 = rep(NA, n) for(i in 1:n){ v2[i] = i object_size_fast_loop[i] = object_size(v2) } sizes = cbind.data.frame(object_size_fast_loop, object_size_slow_loop) print(ggplot(sizes, aes(1:n)) + geom_line(aes(y = object_size_fast_loop, colour = "Fast loop")) + geom_line(aes(y = object_size_slow_loop, colour = "Slow loop")) + ggtitle("Size of the object at each iteration") + xlab("Iteration") + ylab("Size in bytes") )
Beyond Vectors
In the above examples, we used vectors just because they are simpler to visualize, but note that this principle holds for any other data structure, such as matrices and lists. Let’s see one quick example of each, starting with the matrix case:
n = 1e3 f_slow_matrix = function(n) { M = NULL for(i in 1:n){ M = rbind(M, rep(1, n)) } return(M) } f_fast_matrix = function(n) { M = matrix(NA, nrow = n, ncol = n) for(i in 1:n){ M[i,] = rep(1, n) } return(M) } mbmM = microbenchmark("f_slow_matrix" = f_slow_matrix(n), "f_fast_matrix" = f_fast_matrix(n) ,times = 50 ) print(mbmM) ## Unit: milliseconds ## expr min lq mean median uq max neval ## f_slow_matrix 5823.364 5858.490 5911.030 5914.014 5945.253 6040.606 50 ## f_fast_matrix 7.744 9.607 10.788 10.349 11.270 14.078 50
and, finally, an example with lists:
n = 1e4 f_slow_list = function(n) { L = list() for(i in 1:n){ L[[paste0('index_',i)]] = i } return(L) } f_fast_list = function(n) { L = vector(mode = "list", length = n) for(i in 1:n){ L[[i]] = i } names(L) = paste0("index_", 1:n) return(L) } mbmL = microbenchmark("f_slow_list" = f_slow_list(n), "f_fast_list" = f_fast_list(n) ,times = 50 ) print(mbmL) ## Unit: milliseconds ## expr min lq mean median uq max neval ## f_slow_list 607.299 616.947 682.738 639.575 656.597 1298.282 50 ## f_fast_list 3.273 3.364 3.967 3.559 3.775 7.704 50
Final Note: When maximum length is not known from the start
It could be the case that the maximum length is not known to the user. However, an upper bound (even a very bad one) could be used, and then after running the loop we can simply delete the part that wasn’t useful. For example, suppose that the true length is 100,000, but all we known is the upper bound 10,000,000. We can preallocate memory using our upper bound, as follows:
n = 1e5 UB = 1e7 f_slow = function(n) { v = NULL for(i in 1:n) v = c(v, i) return(v) } f_fast_upper_bound = function(n, UB) { v = rep(NA, UB) for(i in 1:n) { v[i] = i } v = v[1:n] return(v) } mbmUB = microbenchmark("f_slow" = f_slow(n), "f_fast_upper_bound" = f_fast_upper_bound(n, 1e7), times = 50 ) print(mbmUB) ## Unit: milliseconds ## expr min lq mean median uq max neval ## f_slow 7529.350 7903.024 8394.622 8598.460 8828.130 9318.457 50 ## f_fast_upper_bound 38.629 54.643 112.712 128.72 137.910 224.047 50
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.