faster for() loops in R
[This article was first published on Ancient Eco, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One of the most common ways people write for() loops is to create an empty results vector and then concatenate each result with the previous (and growing) results vector, like the following. (Note: wrapping an expression in the function system.time() executes the function and returns a summary of how long it took, in seconds.)
x <- c()
system.time(
for(i in 1:40000){
x<-c(x,i) #here i is combined with previous contents of x
}
)
user system elapsed
2.019 0.082 2.100
It is MUCH faster to create the results an empty vector of the correct size, and modify elements in place. This prevents R from having to move around an ever growing object in memory and is much faster. In short….it seems that what R is slow at is allocating memory for objects.
x<-numeric(40000) #empty numeric vector
system.time(
for(i in 1:40000){
x[i] <- i #changing value of particular element of x
}
)
user system elapsed
0.066 0.001 0.067
The second method is over 31 times faster on my machine.
PS. This post was inspired by Hadley Wickham’s much more technical and in-depth coverage of memory usage in R.
To leave a comment for the author, please follow the link and comment on their blog: Ancient Eco.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.