Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
############################################################### # STATUS: WORKING, but only tested once or twice, # tested with most ?apply examples # ISSUES/TODO: MARGIN argument cannot take a # vector like 1:2 that is more than one numeric apply_pb <- function(X, MARGIN, FUN, ...) { env <- environment() pb_Total <- sum(dim(X)[MARGIN]) counter <- 0 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3) wrapper <- function(...) { curVal <- get("counter", envir = env) assign("counter", curVal +1 ,envir= env) setTxtProgressBar(get("pb", envir= env), curVal +1) FUN(...) } res <- apply(X, MARGIN, wrapper, ...) close(pb) res } ## NOT RUN: # apply_pb(anscombe, 2, sd, na.rm=TRUE) ## large dataset # df <- data.frame(rnorm(30000), rnorm(30000)) # apply_pb(df, 1, sd) ############################################################### lapply_pb <- function(X, FUN, ...) { env <- environment() pb_Total <- length(X) counter <- 0 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3) # wrapper around FUN wrapper <- function(...){ curVal <- get("counter", envir = env) assign("counter", curVal +1 ,envir=env) setTxtProgressBar(get("pb", envir=env), curVal +1) FUN(...) } res <- lapply(X, wrapper, ...) close(pb) res } ## NOT RUN: # l <- sapply(1:20000, function(x) list(rnorm(1000))) # lapply_pb(l, mean) ############################################################### sapply_pb <- function(X, FUN, ...) { env <- environment() pb_Total <- length(X) counter <- 0 pb <- txtProgressBar(min = 0, max = pb_Total, style = 3) wrapper <- function(...){ curVal <- get("counter", envir = env) assign("counter", curVal +1 ,envir=env) setTxtProgressBar(get("pb", envir=env), curVal +1) FUN(...) } res <- sapply(X, wrapper, ...) close(pb) res } ## NOT RUN: # l <- sapply(1:20000, function(x) list(rnorm(1000)) # sapply_pb(l, mean) ###############################################################
Nice up to now, but now let’s see what the difference in performance due to the wrapper overhead looks like.
############################################################### > l <- sapply(1:20000, function(x) list(rnorm(1000))) > system.time(sapply(l, mean)) User System verstrichen 0.474 0.003 0.475 > system.time(sapply_pb(l, mean)) |======================================================| 100% User System verstrichen 1.863 0.025 1.885 > df <- data.frame(rnorm(90000), rnorm(90000)) > system.time(apply(df, 1, sd)) User System verstrichen 7.152 0.062 7.260 > system.time(apply_pb(df, 1, sd)) |======================================================| 100% User System verstrichen 13.112 0.099 13.192 ###############################################################
So, what we see is that performance radically goes down. This is extremely problematic in our context as one will tend to use progress bars in situations where processing times are already quite long. So if someone has an improvement for that I would be glad to hear about it.
Latest version with more comments on github.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.