Progress bars in R (part II) – a wrapper for apply functions
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In a previous post I gave some examples of how to make a progress bar in R. In the examples the bars were created within loops. Very often though I have situations where I would like have a progress bar when using apply(). The plyr package provides several apply-like functions also including progress bars, so one could have a look here and use a plyr function instead of apply if possible. Anyway, here comes a wrapper for apply, lapply and sapply that has a progressbar. It seems to work although one known issue is the use of vectors (like c(1,2)with the MARGIN argument in apply_pb(). Also you can see in the performance comparison below that the wrapper causes overhead to a considerable extent, which is the main drawback of this approach.
############################################################### # STATUS: WORKING, but only tested once or twice, # tested with most ?apply examples # ISSUES/TODO: MARGIN argument cannot take a # vector like 1:2 that is more than one numeric apply_pb <- function(X, MARGIN, FUN, ...) { env <- environment() pb_Total <- sum(dim(X)[MARGIN]) counter <- 0 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3) wrapper <- function(...) { curVal <- get("counter", envir = env) assign("counter", curVal +1 ,envir= env) setTxtProgressBar(get("pb", envir= env), curVal +1) FUN(...) } res <- apply(X, MARGIN, wrapper, ...) close(pb) res } ## NOT RUN: # apply_pb(anscombe, 2, sd, na.rm=TRUE) ## large dataset # df <- data.frame(rnorm(30000), rnorm(30000)) # apply_pb(df, 1, sd) ############################################################### lapply_pb <- function(X, FUN, ...) { env <- environment() pb_Total <- length(X) counter <- 0 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3) # wrapper around FUN wrapper <- function(...){ curVal <- get("counter", envir = env) assign("counter", curVal +1 ,envir=env) setTxtProgressBar(get("pb", envir=env), curVal +1) FUN(...) } res <- lapply(X, wrapper, ...) close(pb) res } ## NOT RUN: # l <- sapply(1:20000, function(x) list(rnorm(1000))) # lapply_pb(l, mean) ############################################################### sapply_pb <- function(X, FUN, ...) { env <- environment() pb_Total <- length(X) counter <- 0 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3) wrapper <- function(...){ curVal <- get("counter", envir = env) assign("counter", curVal +1 ,envir=env) setTxtProgressBar(get("pb", envir=env), curVal +1) FUN(...) } res <- sapply(X, wrapper, ...) close(pb) res } ## NOT RUN: # l <- sapply(1:20000, function(x) list(rnorm(1000)) # sapply_pb(l, mean) ###############################################################
Nice up to now, but now let’s see what the difference in performance due to the wrapper overhead looks like.
############################################################### > l <- sapply(1:20000, function(x) list(rnorm(1000))) > system.time(sapply(l, mean)) User System verstrichen 0.474 0.003 0.475 > system.time(sapply_pb(l, mean)) |======================================================| 100% User System verstrichen 1.863 0.025 1.885 > df <- data.frame(rnorm(90000), rnorm(90000)) > system.time(apply(df, 1, sd)) User System verstrichen 7.152 0.062 7.260 > system.time(apply_pb(df, 1, sd)) |======================================================| 100% User System verstrichen 13.112 0.099 13.192 ###############################################################
So, what we see is that performance radically goes down. This is extremely problematic in our context as one will tend to use progress bars in situations where processing times are already quite long. So if someone has an improvement for that I would be glad to hear about it.
Latest version with more comments on github.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.