Site icon R-bloggers

Speeding up R code: A case study

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

On his Psychology and Statistics blog, Jeromy Anglim tells how he was analyzing some data from a skill acquisition experiment. Needing to run a custom R function across 1.3 million data points, Jeromy estimated it would take several hours for the computation to complete. So, Jeromy set out to optimise the code.

First, he used the Rprof function, which inspects your R functions as they run, and counts the amount of time spent in each sub-function. This is a useful tool to identify the parts of your functions that are ripe for optimisation, and in this case (with some help from the system.time function to time a specific section of the code) he learned that most of the time wasn’t taken performing actual calculations: most time was actually spent selecting the subset of the data to analyze!

And thus a solution was born: rather than repeatedly selecting from the large data frame in an iterative loop, he instead split the data frame into its constituent parts once, and then looped over the parts. This reduced the analysis time from hours down to just a couple of minutes. As the end of his case study, Jeromy shares some valuable lessons learned about optimising R functions:

Jeromy Anglim’s Blog: Psychology & Statistics: A Case Study in Optimising Code in R

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.