Improved R Profiling Summaries
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In my last post I mentioned that I had improved on R’s summaryRprof()
function with a custom function called proftable()
. I’ve updated proftable()
to take advantage of R 3.0.0’s ability to record line numbers while profiling. I’ve put it on github – you can get it there or below.
proftable
reads in a file generated by Rprof()
and creates an easy-to read table of the most time-consuming calls in your code, ordered from most time-consuming to least. Unlike summaryRprof()
, it prints the entire call stack, so you can trace the origin of the time hogs. To make this easier to read, I lop off the “parent stack” common to all of the function calls, and display it just once, below the table. Here’s some example output:
> Rprof("profile1.out", line.profiling=TRUE) > source("http://pastebin.com/download.php?i=KjdkSVZq") > Rprof(NULL) > proftable("profile1.out", lines=10) PctTime Call 20.47 1#17 > [ > 1#17 > [.data.frame 9.73 1#17 > [ > 1#17 > [.data.frame > [ > [.factor 8.72 1#17 > [ > 1#17 > [.data.frame > [ > [.factor > NextMethod 8.39 == > Ops.factor 5.37 == 5.03 == > Ops.factor > noNA.levels > levels 4.70 == > Ops.factor > NextMethod 4.03 1#17 > [ > 1#17 > [.data.frame > [ > [.factor > levels 4.03 1#17 > [ > 1#17 > [.data.frame > dim 3.36 1#17 > [ > 1#17 > [.data.frame > length #File 1: http://pastebin.com/download.php?i=KjdkSVZq Parent Call: source > withVisible > eval > eval > Total Time: 5.96 seconds Percent of run time represented: 73.8 %
Note that the “Parent Call” at the bottom shows some functions which RStudio wrapped my code in. Also, I chose only to display the top 10 time-consuming calls, but proftable
told me that those 10 calls represent 73.8% of the run time. I find this display a lot more intuitive than summaryRprof()
Here’s the whole function. If you have improvements, fork it on github:
proftable <- function(file, lines=10) { # require(plyr) interval <- as.numeric(strsplit(readLines(file, 1), "=")[[1L]][2L])/1e+06 profdata <- read.table(file, header=FALSE, sep=" ", comment.char = "", colClasses="character", skip=1, fill=TRUE, na.strings="") filelines <- grep("#File", profdata[,1]) files <- aaply(as.matrix(profdata[filelines,]), 1, function(x) { paste(na.omit(x), collapse = " ") }) profdata <- profdata[-filelines,] total.time <- interval*nrow(profdata) profdata <- as.matrix(profdata[,ncol(profdata):1]) profdata <- aaply(profdata, 1, function(x) { c(x[(sum(is.na(x))+1):length(x)], x[seq(from=1,by=1,length=sum(is.na(x)))]) }) stringtable <- table(apply(profdata, 1, paste, collapse=" ")) uniquerows <- strsplit(names(stringtable), " ") uniquerows <- llply(uniquerows, function(x) replace(x, which(x=="NA"), NA)) dimnames(stringtable) <- NULL stacktable <- ldply(uniquerows, function(x) x) stringtable <- stringtable/sum(stringtable)*100 stacktable <- data.frame(PctTime=stringtable[], stacktable) stacktable <- stacktable[order(stringtable, decreasing=TRUE),] rownames(stacktable) <- NULL stacktable <- head(stacktable, lines) na.cols <- which(sapply(stacktable, function(x) all(is.na(x)))) stacktable <- stacktable[-na.cols] parent.cols <- which(sapply(stacktable, function(x) length(unique(x)))==1) parent.call <- paste0(paste(stacktable[1,parent.cols], collapse = " > ")," >") stacktable <- stacktable[,-parent.cols] calls <- aaply(as.matrix(stacktable[2:ncol(stacktable)]), 1, function(x) { paste(na.omit(x), collapse= " > ") }) stacktable <- data.frame(PctTime=stacktable$PctTime, Call=calls) frac <- sum(stacktable$PctTime) attr(stacktable, "total.time") <- total.time attr(stacktable, "parent.call") <- parent.call attr(stacktable, "files") <- files attr(stacktable, "total.pct.time") <- frac cat("\n") print(stacktable, row.names=FALSE, right=FALSE, digits=3) cat("\n") cat(paste(files, collapse="\n")) cat("\n") cat(paste("\nParent Call:", parent.call)) cat(paste("\n\nTotal Time:", total.time, "seconds\n")) cat(paste0("Percent of run time represented: ", format(frac, digits=3)), "%") invisible(stacktable) }
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.