R profiling
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Profiling in R
R has a built in performance and memory profiling facility: Rprof. Type
?Rprofinto your console to learn more.
The way the profiler works is as follows:
- you start the profiler by calling Rprof, providing a filename where the profiling data should be stored
- you call the R functions that you want to analyse
- you call Rprof(NULL) to stop the profiler
- you analyse the file created by Rprof, typically using
summaryRprof
For example:
Rprof(tmp <- tempfile()) example(glm) Rprof() summaryRprof(tmp) unlink(tmp)
The output looks like this:
$by.self self.time self.pct total.time total.pct "str.default" 0.02 14.29 0.10 71.43 "deparse" 0.02 14.29 0.04 28.57 "as.name" 0.02 14.29 0.02 14.29 "formals" 0.02 14.29 0.02 14.29 "make.names" 0.02 14.29 0.02 14.29 "parent.frame" 0.02 14.29 0.02 14.29 "pmatch" 0.02 14.29 0.02 14.29 $by.total total.time total.pct self.time self.pct "eval" 0.14 100.00 0.00 0.00 "withVisible" 0.14 100.00 0.00 0.00 "str.default" 0.10 71.43 0.02 14.29 "<Anonymous>" 0.10 71.43 0.00 0.00 "capture.output" 0.10 71.43 0.00 0.00 "doTryCatch" 0.10 71.43 0.00 0.00 "evalVis" 0.10 71.43 0.00 0.00 ".rs.valueContents" 0.10 71.43 0.00 0.00 ".rs.valueFromStr" 0.10 71.43 0.00 0.00 "str" 0.10 71.43 0.00 0.00 "try" 0.10 71.43 0.00 0.00 "tryCatch" 0.10 71.43 0.00 0.00 "tryCatchList" 0.10 71.43 0.00 0.00 "tryCatchOne" 0.10 71.43 0.00 0.00 "do.call" 0.08 57.14 0.00 0.00 "strSub" 0.08 57.14 0.00 0.00 "deparse" 0.04 28.57 0.02 14.29 "example" 0.04 28.57 0.00 0.00 "FUN" 0.04 28.57 0.00 0.00 "lapply" 0.04 28.57 0.00 0.00 "match" 0.04 28.57 0.00 0.00 "source" 0.04 28.57 0.00 0.00 "as.name" 0.02 14.29 0.02 14.29 "formals" 0.02 14.29 0.02 14.29 "make.names" 0.02 14.29 0.02 14.29 "parent.frame" 0.02 14.29 0.02 14.29 "pmatch" 0.02 14.29 0.02 14.29 "anova" 0.02 14.29 0.00 0.00 "anova.glm" 0.02 14.29 0.00 0.00 "data.frame" 0.02 14.29 0.00 0.00 "deParse" 0.02 14.29 0.00 0.00 ".deparseOpts" 0.02 14.29 0.00 0.00 ".getXlevels" 0.02 14.29 0.00 0.00 "glm" 0.02 14.29 0.00 0.00 "%in%" 0.02 14.29 0.00 0.00 "match.call" 0.02 14.29 0.00 0.00 "mode" 0.02 14.29 0.00 0.00 "NextMethod" 0.02 14.29 0.00 0.00 "paste" 0.02 14.29 0.00 0.00 "sapply" 0.02 14.29 0.00 0.00 "str.data.frame" 0.02 14.29 0.00 0.00 "summary" 0.02 14.29 0.00 0.00 "%w/o%" 0.02 14.29 0.00 0.00 $sample.interval [1] 0.02 $sampling.time [1] 0.14
A lot of information!
As a side note, the sample.interval of 0.02 indicates the frequency with which Rprofile analysis the call stack and takes measures. So, Rprofile works entirely through polling. As a result, summaryRprof may look different each time you profile code. Not only slight differences in the numbers, but also e.g. elements missing, for instance because in one run the measurement was done by chance while, say, mode was executing, whereas in another run mode slipped between two measurements.
Stylized profiling example
Let’s look at a stylized example. Assume you have the following functions:
Do_1 <- function() { combn(1:20, 5) for (i in 1:15) Do_2() for (i in 1:25) Do_4() } Do_2 <- function() { combn(1:15, 5) for (i in 1:5) Do_3() } Do_3 <- function() { combn(1:14, 5) for (i in 1:20) Do_4() } Do_4 <- function() { paste(1:1000) combn(1:11, 5) }
Ugly and pointless, true, but for the sake of this example they serve their purpose: it will take time to execute them.
So, again, let’s profile
Do_1:
Rprof(tmp <- tempfile()) Do_1() Rprof(NULL) summaryRprof(tmp)
Which looks like this:
$by.self self.time self.pct total.time total.pct "combn" 1.24 71.26 1.28 73.56 "paste" 0.46 26.44 0.46 26.44 "matrix" 0.04 2.30 0.04 2.30 $by.total total.time total.pct self.time self.pct "Do_1" 1.74 100.00 0.00 0.00 "Do_2" 1.72 98.85 0.00 0.00 "Do_3" 1.68 96.55 0.00 0.00 "Do_4" 1.48 85.06 0.00 0.00 "combn" 1.28 73.56 1.24 71.26 "paste" 0.46 26.44 0.46 26.44 "matrix" 0.04 2.30 0.04 2.30
Nice. We see that combn uses about three quarter of computing time, while paste uses only one quarter.
But hang on: matrix? Where does that come from? Must be either combn or paste calling that internally. No big deal here, as matrix only uses 2.3% of total time. But still, would be interesting to understand this, right?
Analysis of profiling data with prof.tree
Luckily, the prof.tree package by Artem Kelvtsov, which is available from CRAN or from github, provides an alternative to analyzing that data. It displays profiling information as a tree:
library(prof.tree) pr <- prof.tree(tmp) print(pr, limit = NULL)
This will print like so:
levelName real percent env 1 calls 1.74 100.0 % 2 °--Do_1 1.74 100.0 % R_GlobalEnv 3 ¦--Do_2 1.72 98.9 % R_GlobalEnv 4 ¦ ¦--Do_3 1.68 96.6 % R_GlobalEnv 5 ¦ ¦ ¦--combn 0.22 12.6 % utils 6 ¦ ¦ ¦ °--matrix 0.02 1.1 % base 7 ¦ ¦ °--Do_4 1.46 83.9 % R_GlobalEnv 8 ¦ ¦ ¦--combn 1.02 58.6 % utils 9 ¦ ¦ ¦ °--matrix 0.02 1.1 % base 10 ¦ ¦ °--paste 0.44 25.3 % base 11 ¦ °--combn 0.04 2.3 % utils 12 °--Do_4 0.02 1.1 % R_GlobalEnv 13 °--paste 0.02 1.1 % base
Surprise! matrix was called from combn, and not from paste!
Note that pr is a data.tree structure, so all data.tree operations are available. For example, we can sum up specific functions by name:
library(data.tree) SumByFunction <- function(name) { sum(pr$Get("real", filterFun = function(node) node$name == name))/pr$real } SumByFunction("combn")
And, just as above, this gives us 73.56%.
Also, we can limit the nodes that are printed, by pruning out all subtrees whose percent is larger than, say, 5%:
print(pr, limit = NULL, pruneFun = function(x) x$percent > 0.05)
Et voilà:
levelName real percent env 1 calls 1.74 100.0 % 2 °--Do_1 1.74 100.0 % R_GlobalEnv 3 °--Do_2 1.72 98.9 % R_GlobalEnv 4 °--Do_3 1.68 96.6 % R_GlobalEnv 5 ¦--combn 0.22 12.6 % utils 6 °--Do_4 1.46 83.9 % R_GlobalEnv 7 ¦--combn 1.02 58.6 % utils 8 °--paste 0.44 25.3 % base
Or we can use the data.tree plot facility to visualize this:
cols <- colorRampPalette(c("green", "red"))(101) SetNodeStyle(pr, style = "filled,rounded", shape = "box", fontname = "helvetica", fillcolor = function(node) cols[round(node$percent * 100) + 1], tooltip = function(node) node$real) plot(pr)
This will plot like so:
If you like what you see, make sure you show some love by starring Artem’s package on github.
The post R profiling appeared first on ipub.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.