Site icon R-bloggers

Walking the line between reproducibility and efficiency in R Markdown: Three methods

[This article was first published on R on Pablo Bernabeu, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As technology and research methods advance, the data sets tend to be larger and the methods more exhaustive. Consequently, the analyses take longer to run. This poses a challenge when the results are to be presented using R Markdown. One has to walk the line between reproducibility and efficiency. On the one hand, it is desirable to keep the R Markdown document as self-contained as possible, so that those who may later examine the document can test and edit the code as easily as possible. Yet, on the other hand, it would be inefficient to create a document that is very slow to run or very long. Different methods can be used in each document to accommodate different types of code. Below, three methods are presented, ordered according to the amount of running time and the length of the code.

  1. Code present in the Rmd file, and run as the document is rendered: used for fast, concise code. Example:

     nrow(myData)
  2. Code sourced from separate scripts, and run as the document is rendered: used for slower or longer code. Example:

     source('results/model_2.R')
  3. Code run separately, with only the result being presented in the document: used for very slow or long code. Example:

     model_1 = readRDS('results/model_1.rds')

To leave a comment for the author, please follow the link and comment on their blog: R on Pablo Bernabeu.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.