The performance cost of a for-loop, and some alternatives
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve recently been spending a lot of time running various simulations in R. Because I often use snow to perform simulations across several computers/cores, results typically come back in the form of a list object. Summarizing the results from a list is simple enough using a for
-loop, but it’s much “sexier” to use a functional style of programming that takes advantage of higher order functions or the *apply
-family of functions (R is, after all, a functional language at its core). But what’s the performance cost of using, for instance, lapply
with a large list, particularly when the results from lapply
have to be further manipulated to get them into a useful form? I recently did some performance testing on three of the *apply
functions and compared them to an equivalent for
-loop approach. The results follow.
First, I set up a large list with some fake data. Here I make a list, Sims
, containing 30000 \(5 \times 5\) matrices. (This is a data structure similar to one I recently found myself analyzing.)
set.seed(12345) S <- 30000 Sims <- Map(function(x) matrix(rnorm(25), nrow=5, ncol=5), 1:S)
Now I define four functions that extract the fifth column of each matrix in Sims
and create a \(30000 \times 5 \) matrix of results.
for_method <- function(S, Sims) { R <- matrix(NA, ncol=5, nrow=S) for (i in 1:S) R[i,] <- Sims[[i]][,5] R } sapply_method <- function(Sims) { t(sapply(Sims, function(x) x[,5])) } lapply_method <- function(Sims) { do.call(rbind, lapply(Sims, function(x) x[,5])) } vapply_method <- function(Sims) { t(vapply(Sims, function(x) x[,5], rep(0, 5))) }
Notice that the for_method
first creates a container matrix and inserts the results directly into it. Using something like rbind
would force R to copy the matrix in each for-iteration, which is very inefficient. Also notice that in each of the apply methods, the result has to be manipulated before it is returned. It’s often possible to eliminate this step during the initial simulation process. But if you’re like me, you sometimes start your simulations before you realize what you intend on doing with the results.
Now for some timing results:
> system.time(replicate(20, R.1 <<- for_method(S, Sims))) user system elapsed 4.340 0.072 4.414 > system.time(replicate(20, R.2 <<- sapply_method(Sims))) user system elapsed 3.452 0.076 3.528 > system.time(replicate(20, R.3 <<- lapply_method(Sims))) user system elapsed 4.393 0.044 4.433 > system.time(replicate(20, R.4 <<- vapply_method(Sims))) user system elapsed 2.588 0.040 2.628 > all(identical(R.1, R.2), identical(R.1, R.3), identical(R.1, R.4)) [1] TRUE
These results surprised me—I never expected vapply
to be so much quicker than the other three methods. This may have to do with that fact that, since vapply
requires you to explicitly specify the resultant data structure, R is able allocate the memory ahead of time.
As a simple extension, I thought I would retest taking advantage of byte-compiled code:
library(compiler) for_method <- cmpfun(for_method) sapply_method <- cmpfun(sapply_method) lapply_method <- cmpfun(lapply_method) vapply_method <- cmpfun(vapply_method)
And here are the results:
> system.time(replicate(10, R.1 <<- for_method(S, Sims))) user system elapsed 2.084 0.024 2.110 > system.time(replicate(10, R.2 <<- sapply_method(Sims))) user system elapsed 1.624 0.016 1.642 > system.time(replicate(10, R.3 <<- lapply_method(Sims))) user system elapsed 2.152 0.012 2.164 > system.time(replicate(10, R.4 <<- vapply_method(Sims))) user system elapsed 1.204 0.004 1.207 > all(identical(R.1, R.2), identical(R.1, R.3), identical(R.1, R.4)) [1] TRUE
These results are also surprising. I really thought byte-compiling the functions would make the results much closer (doesn’t each function compile into what is effectively a for
-loop?). However, in this simple example at least, that isn’t the case. vapply
still has, by a significant margin, the best performance. sapply
also performs noticeably better than lapply
and the for
-loop, which is an important result since sapply
doesn’t require a pre-specified data structure. Assuming they hold in the presence of more complicated data structures (and I don’t see what they wouldn’t), these results seem to suggest that a functional approach can provide benefits in terms of coding efficiency and run-time performance.
I welcome any feedback or suggestions on how these tests could be improved or extended.
Update
A couple things have come up in the comments. First, Henrik provided a more elegant (and efficient) version of vapply_method
:
vapply_method2 <- function(Sims) { t(vapply(Sims, "[", rep(0, 5), i = 1:5, j = 5)) }
Second, several people have reported that the for_method
on their system is just as fast, if not faster, than the vapply_method
when compiled. I have retested my results, adding Henrik's vapply_method2
. Here are the results for compiled code (note that I have increased the number of replications):
> library(compiler) > set.seed(12345) > S <- 30000 > Sims <- Map(function(x) matrix(rnorm(25), nrow=5, ncol=5), 1:S) > > for_method <- function(S, Sims) { + R <- matrix(NA, ncol=5, nrow=S) + for (i in 1:S) R[i,] <- Sims[[i]][,5] + R + } > > sapply_method <- function(Sims) { + t(sapply(Sims, function(x) x[,5])) + } > > lapply_method <- function(Sims) { + do.call(rbind, lapply(Sims, function(x) x[,5])) + } > > vapply_method <- function(Sims) { + t(vapply(Sims, function(x) x[,5], rep(0, 5))) + } > > vapply_method2 <- function(Sims) { + t(vapply(Sims, "[", rep(0, 5), i = 1:5, j = 5)) + } > for_method <- cmpfun(for_method) > sapply_method <- cmpfun(sapply_method) > lapply_method <- cmpfun(lapply_method) > vapply_method <- cmpfun(vapply_method) > vapply_method2 <- cmpfun(vapply_method2) > system.time(replicate(50, R.1 <<- for_method(S, Sims))) user system elapsed 6.052 0.136 6.191 > system.time(replicate(50, R.2 <<- sapply_method(Sims))) user system elapsed 8.361 0.116 8.474 > system.time(replicate(50, R.3 <<- lapply_method(Sims))) user system elapsed 10.548 0.144 10.692 > system.time(replicate(50, R.4 <<- vapply_method(Sims))) user system elapsed 5.813 0.124 5.936 > system.time(replicate(50, R.5 <<- vapply_method2(Sims))) user system elapsed 4.268 0.084 4.352
For whatever reason, the vapply
methods remain the fastest on my system (particularly Henrik’s version). I also tested on another Linux machine and got the same results (I have not yet tested on Windows). Of course, this is a very simple example, so the performance differences may not hold up to the increasing complexity of the input function. I’d be interested to know if anyone has an idea as to why vapply
would perform better in Linux than in Windows/Mac. A library thing?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.