[This article was first published on R is my friend » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A few weeks ago I gave a presentation on using Sweave and Knitr under the guise of promoting reproducible research. I humbly offer this presentation to the blog with full knowledge that there are already loads of tutorials available online. This presentation is
Cheers,
\documentclass[xcolor=svgnames]{beamer} %\documentclass[xcolor=svgnames,handout]{beamer} \usetheme{Boadilla} \usecolortheme[named=Sienna]{structure} \usepackage{graphicx} \usepackage[final]{animate} %\usepackage[colorlinks=true,urlcolor=blue,citecolor=blue,linkcolor=blue]{hyperref} \usepackage{breqn} \usepackage{xcolor} \usepackage{booktabs} \usepackage{verbatim} \usepackage{tikz} \usetikzlibrary{shadows,arrows,positioning} \usepackage[noae]{Sweave} \definecolor{links}{HTML}{2A1B81} \hypersetup{colorlinks,linkcolor=links,urlcolor=links} \usepackage{pgfpages} %\pgfpagesuselayout{4 on 1}[letterpaper, border shrink = 5mm, landscape] \tikzstyle{block} = [rectangle, draw, text width=7em, text centered, rounded corners, minimum height=3em, minimum width=7em, top color = white, bottom color=brown!30, drop shadow] \newcommand{\ShowSexpr}[1]{\texttt{{\char`\\}Sexpr\{#1\}}} \begin{document} \SweaveOpts{concordance=TRUE} \title[Nuts and bolts of Sweave/Knitr]{The nuts and bolts of Sweave/Knitr for reproducible research with \LaTeX} \author[M. Beck]{Marcus W. Beck} \institute[USEPA NHEERL]{ORISE Post-doc Fellow\\ USEPA NHEERL Gulf Ecology Division, Gulf Breeze, FL\\ Email: \href{mailto:beck.marcus@epa.gov}{beck.marcus@epa.gov}, Phone: 850 934 2480} \date{January 15, 2014} %%%%%% \begin{frame} \vspace{-0.3in} \titlepage \end{frame} %%%%%% \begin{frame}{Reproducible research} \onslide<+-> In it's most general sense... the ability to reproduce results from an experiment or analysis conducted by another.\\~\\ \onslide<+-> From Wikipedia... `The ultimate product is the \alert{paper along with the full computational environment} used to produce the results in the paper such as the code, data, etc. that can be \alert{used to reproduce the results and create new work} based on the research.'\\~\\ \onslide<+-> Concept is strongly based on the idea of \alert{literate programming} such that the logic of the analysis is clearly represented in the final product by combining computer code/programs with ordinary human language [Knuth, 1992]. \end{frame} %%%%%% \begin{frame}{Non-reproducible research} \begin{center} \begin{tikzpicture}[node distance=2.5cm, auto, >=stealth] \onslide<2->{ \node[block] (a) {1. Gather data};} \onslide<3->{ \node[block] (b) [right of=a, node distance=4.2cm] {2. Analyze data}; \draw[->] (a) -- (b);} \onslide<4->{ \node[block] (c) [right of=b, node distance=4.2cm] {3. Report results}; \draw[->] (b) -- (c);} % \onslide<5->{ % \node [right of=a, node distance=2.1cm] {\textcolor[rgb]{1,0,0}{X}}; % \node [right of=b, node distance=2.1cm] {\textcolor[rgb]{1,0,0}{X}};} \end{tikzpicture} \end{center} \vspace{-0.5cm} \begin{columns}[t] \onslide<2->{ \begin{column}{0.33\textwidth} \begin{itemize} \item Begins with general question or research objectives \item Data collected in raw format (hard copy) converted to digital (Excel spreadsheet) \end{itemize} \end{column}} \onslide<3->{ \begin{column}{0.33\textwidth} \begin{itemize} \item Import data into stats program or analyze directly in Excel \item Create figures/tables directly in stats program \item Save relevant output \end{itemize} \end{column}} \onslide<4->{ \begin{column}{0.33\textwidth} \begin{itemize} \item Create research report using Word or other software \item Manually insert results into report \item Change final report by hand if methods/analysis altered \end{itemize} \end{column}} \end{columns} \end{frame} %%%%%% \begin{frame}{Reproducible research} \begin{center} \begin{tikzpicture}[node distance=2.5cm, auto, >=stealth] \onslide<1->{ \node[block] (a) {1. Gather data};} \onslide<1->{ \node[block] (b) [right of=a, node distance=4.2cm] {2. Analyze data}; \draw[<->] (a) -- (b);} \onslide<1->{ \node[block] (c) [right of=b, node distance=4.2cm] {3. Report results}; \draw[<->] (b) -- (c);} \end{tikzpicture} \end{center} \vspace{-0.5cm} \begin{columns}[t] \onslide<1->{ \begin{column}{0.33\textwidth} \begin{itemize} \item Begins with general question or research objectives \item Data collected in raw format (hard copy) converted to digital (\alert{text file}) \end{itemize} \end{column}} \onslide<1->{ \begin{column}{0.33\textwidth} \begin{itemize} \item Create \alert{integrated script} for importing data (data path is known) \item Create figures/tables directly in stats program \item \alert{No need to export} (reproduced on the fly) \end{itemize} \end{column}} \onslide<1->{ \begin{column}{0.33\textwidth} \begin{itemize} \item Create research report using RR software \item \alert{Automatically include results} into report \item \alert{Change final report automatically} if methods/analysis altered \end{itemize} \end{column}} \end{columns} \end{frame} %%%%%% \begin{frame}{Reproducible research in R} Easily adopted using RStudio [\href{http://www.rstudio.com/}{http://www.rstudio.com/}]\\~\\ Also possible w/ Tinn-R or via command prompt but not as intuitive\\~\\ Requires a \LaTeX\ distribution system - use MikTex for Windows [\href{http://miktex.org/}{http://miktex.org/}]\\~\\ \onslide<2->{ Essentially a \LaTeX\ document that incorporates R code... \\~\\ Uses Sweave (or Knitr) to convert .Rnw file to .tex file, then \LaTeX\ to create pdf\\~\\ Sweave comes with \texttt{utils} package, may have to tell R where it is \\~\\ } \end{frame} %%%%%% \begin{frame}{Reproducible research in R} Use same procedure for compiling a \LaTeX\ document with one additional step \begin{center} \begin{tikzpicture}[node distance=2.5cm, auto, >=stealth] \onslide<2->{ \node[block] (a) {1. myfile.Rnw};} \onslide<3->{ \node[block] (b) [right of=a, node distance=4.2cm] {2. myfile.tex}; \draw[->] (a) -- (b);\node [right of=a, above=0.5cm, node distance=2.1cm] {Sweave};} \onslide<4->{ \node[block] (c) [right of=b, node distance=4.2cm] {3. myfile.pdf}; \draw[->] (b) -- (c); \node [right of=b, above=0.5cm, node distance=2.1cm] {pdfLatex};} \end{tikzpicture} \end{center} \vspace{-0.5cm} \begin{columns}[t] \onslide<2->{ \begin{column}{0.33\textwidth} \begin{itemize} \item A .tex file but with .Rnw extension \item Includes R code as `chunks' or inline expressions \end{itemize} \end{column}} \onslide<3->{ \begin{column}{0.33\textwidth} \begin{itemize} \item .Rnw file is converted to a .tex file using Sweave \item .tex file contains output from R, no raw R code \end{itemize} \end{column}} \onslide<4->{ \begin{column}{0.33\textwidth} \begin{itemize} \item .tex file converted to pdf (or other output) for final format \item Include biblio with bibtex \end{itemize} \end{column}} \end{columns} \end{frame} %%%%%% \begin{frame}[containsverbatim]{Reproducible research in R} \label{sweaveref} \begin{block}{.Rnw file} \begin{verbatim} \documentclass{article} \usepackage{Sweave} \begin{document} Here's some R code: \Sexpr{'<<eval=true,echo=true>>='} options(width=60) set.seed(2) rnorm(10) \Sexpr{'@'} \end{document} \end{verbatim} \end{block} \end{frame} %%%%%% \begin{frame}[containsverbatim,shrink]{Reproducible research in R} \begin{block}{.tex file} \begin{verbatim} \documentclass{article} \usepackage{Sweave} \begin{document} Here's some R code: \begin{Schunk} \begin{Sinput} > options(width=60) > set.seed(2) > rnorm(10) \end{Sinput} \begin{Soutput} [1] -0.89691455 0.18484918 1.58784533 -1.13037567 [5] -0.08025176 0.13242028 0.70795473 -0.23969802 [9] 1.98447394 -0.13878701 \end{Soutput} \end{Schunk} \end{document} \end{verbatim} \end{block} \end{frame} %%%%%% \begin{frame}{Reproducible research in R} The final product:\\~\\ \centerline{\includegraphics{ex1_input.pdf}} \end{frame} %%%%%% \begin{frame}[fragile]{Sweave - code chunks} \onslide<+-> R code is entered in the \LaTeX\ document using `code chunks' \begin{block}{} \begin{verbatim} \Sexpr{'<<>>='} \Sexpr{'@'} \end{verbatim} \end{block} Any text within the code chunk is interpreted as R code\\~\\ Arguments for the code chunk are entered within \verb|\Sexpr{'<<here>>'}|\\~\\ \onslide<+-> \begin{itemize} \item{\texttt{eval}: evaluate code, default \texttt{T}} \item{\texttt{echo}: return source code, default \texttt{T}} \item{\texttt{results}: format of output (chr string), default is `include' (also `tex' for tables or `hide' to suppress)} \item{\texttt{fig}: for creating figures, default \texttt{F}} \end{itemize} \end{frame} %%%%%% \begin{frame}[fragile]{Sweave - code chunks} Changing the default arguments for the code chunk: \begin{columns}[t] \begin{column}{0.45\textwidth} \onslide<+-> \begin{block}{} \begin{verbatim} \Sexpr{'<<>>='} 2+2 \Sexpr{'@'} \end{verbatim} \end{block} <<>>= 2+2 @ \onslide<+-> \begin{block}{} \begin{verbatim} \Sexpr{'<<eval=F>>='} 2+2 \Sexpr{'@'} \end{verbatim} \end{block} Returns nothing... \end{column} \begin{column}{0.45\textwidth} \onslide<+-> \begin{block}{} \begin{verbatim} \Sexpr{'<<results=hide>>='} 2+2 \Sexpr{'@'} \end{verbatim} \end{block} <<results=hide>>= 2+2 @ \onslide<+-> \begin{block}{} \begin{verbatim} \Sexpr{'<<echo=F>>='} 2+2 \Sexpr{'@'} \end{verbatim} \end{block} <<echo=F>>= 2+2 @ \end{column} \end{columns} \end{frame} %%%%%% \begin{frame}[t,fragile]{Sweave - figures} \onslide<1-> Sweave makes it easy to include figures in your document \begin{block}{} \begin{verbatim} \Sexpr{'<<myfig,fig=T,echo=F,include=T,height=3>>='} set.seed(2) hist(rnorm(100)) \Sexpr{'@'} \end{verbatim} \end{block} \onslide<2-> <<myfig,fig=T,echo=F,include=T,height=3>>= set.seed(2) hist(rnorm(100)) @ \end{frame} %%%%%% \begin{frame}[t,fragile]{Sweave - figures} Sweave makes it easy to include figures in your document \begin{block}{} \begin{verbatim} \Sexpr{'<<myfig,fig=T,echo=F,include=T,height=3>>='} set.seed(2) hist(rnorm(100)) \Sexpr{'@'} \end{verbatim} \end{block} \vspace{\baselineskip} Relevant code options for figures: \begin{itemize} \item{The chunk name is used to name the figure, myfile-myfig.pdf} \item{\texttt{fig}: Lets R know the output is a figure} \item{\texttt{echo}: Use \texttt{F} to suppress figure code} \item{\texttt{include}: Should the figure be automatically include in output} \item{\texttt{height}: (and \texttt{width}) Set dimensions of figure in inches} \end{itemize} \end{frame} %%%%%% \begin{frame}[t,fragile]{Sweave - figures} An alternative approach for creating a figure \begin{block}{} \begin{verbatim} \Sexpr{'<<myfig,fig=T,echo=F,include=F,height=3>>='} set.seed(2) hist(rnorm(100)) \Sexpr{'@'} \includegraphics{rnw_name-myfig.pdf} \end{verbatim} \end{block} \includegraphics{Sweave_intro-myfig.pdf} \end{frame} %%%%%% \begin{frame}[t,fragile]{Sweave - tables} \onslide<1-> Really easy to create tables \begin{block}{} \begin{verbatim} \Sexpr{'<<results=tex,echo=F>>='} library(stargazer) data(iris) stargazer(iris,title='Summary statistics for Iris data') \Sexpr{'@'} \end{verbatim} \end{block} \onslide<2-> <<results=tex,echo=F>>= data(iris) library(stargazer) stargazer(iris,title='Summary statistics for Iris data') @ \end{frame} %%%%%% \begin{frame}[t,fragile]{Sweave - tables} Really easy to create tables \begin{block}{} \begin{verbatim} \Sexpr{'<<results=tex,echo=F>>='} library(stargazer) data(iris) stargazer(iris,title='Summary statistics for Iris data') \Sexpr{'@'} \end{verbatim} \end{block} \vspace{\baselineskip} \texttt{results} option should be set to `tex' (and \texttt{echo=F})\\~\\ Several packages are available to convert R output to \LaTeX\ table format \begin{itemize} \item{xtable: most general package} \item{hmisc: similar to xtable but can handle specific R model objects} \item{stargazer: fairly effortless conversion of R model objects to tables} \end{itemize} \end{frame} %%%%%% \begin{frame}[fragile]{Sweave - expressions} \onslide<1-> All objects within a code chunk are saved in the workspace each time a document is compiled (unless \texttt{eval=F})\\~\\ This allows the information saved in the workspace to be reproduced in the final document as inline text, via \alert{expressions}\\~\\ \onslide<2-> \begin{block}{} \begin{verbatim} \Sexpr{'<<echo=F>>='} data(iris) dat<-iris \Sexpr{'@'} \end{verbatim} Mean sepal length was \ShowSexpr{mean(dat\$Sepal.Length)}. \end{block} \onslide<3-> <<echo=F>>= data(iris) dat<-iris @ \vspace{\baselineskip} Mean sepal length was \Sexpr{mean(dat$Sepal.Length)}. \end{frame} %%%%%% \begin{frame}[fragile]{Sweave - expressions} Change the global R options to change the default output\\~\\ \begin{block}{} \begin{verbatim} \Sexpr{'<<echo=F>>='} data(iris) dat<-iris options(digits=2) \Sexpr{'@'} \end{verbatim} Mean sepal length was \ShowSexpr{format(mean(dat\$Sepal.Length))}. \end{block} <<echo=F>>= data(iris) dat<-iris options(digits=2) @ \vspace{\baselineskip} Mean sepal length was \Sexpr{format(mean(dat$Sepal.Length))}.\\~\\ \end{frame} %%%%%% \begin{frame}{Sweave vs Knitr} \onslide<1-> Does not automatically cache R data on compilation\\~\\ \alert{Knitr} is a useful alternative - similar to Sweave but with minor differences in args for code chunks, more flexible output\\~\\ \onslide<2-> \begin{columns} \begin{column}{0.3\textwidth} Must change default options in RStudio\\~\\ Knitr included with RStudio, otherwise download as package \end{column} \begin{column}{0.6\textwidth} \centerline{\includegraphics[width=0.8\textwidth]{options_ex.png}} \end{column} \end{columns} \end{frame} %%%%%% \begin{frame}[fragile]{Knitr} \onslide<1-> Knitr can be used to cache code chunks\\~\\ Date are saved when chunk is first evaluated, skipped on future compilations unless changed\\~\\ This allows quicker compilation of documents that import lots of data\\ ~\\ \begin{block}{} \begin{verbatim} \Sexpr{'<<mychunk, cache=TRUE, eval=FALSE>>='} load(file='mydata.RData') \Sexpr{'@'} \end{verbatim} \end{block} \end{frame} %%%%%% \begin{frame}[containsverbatim,shrink]{Knitr} \label{knitref} \begin{block}{.Rnw file} \begin{verbatim} \documentclass{article} \Sexpr{'<<setup, include=FALSE, cache=FALSE>>='} library(knitr) #set global chunk options opts_chunk$set(fig.path='H:/docs/figs/', fig.align='center', dev='pdf', dev.args=list(family='serif'), fig.pos='!ht') options(width=60) \Sexpr{'@'} \begin{document} Here's some R code: \Sexpr{'<<eval=T, echo=T>>='} set.seed(2) rnorm(10) \Sexpr{'@'} \end{document} \end{verbatim} \end{block} \end{frame} %%%%%% \begin{frame}{Knitr} The final product:\\~\\ \centerline{\includegraphics[width=\textwidth]{knit_ex.pdf}} \end{frame} %%%%%% \begin{frame}[containsverbatim,shrink]{Knitr} Figures, tables, and expressions are largely the same as in Sweave\\~\\ \begin{block}{Figures} \begin{verbatim} \Sexpr{'<<myfig,echo=F>>='} set.seed(2) hist(rnorm(100)) \Sexpr{'@'} \end{verbatim} \end{block} \vspace{\baselineskip} \begin{block}{Tables} \begin{verbatim} \Sexpr{"<<mytable,results='asis',echo=F,message=F>>="} library(stargazer) data(iris) stargazer(iris,title='Summary statistics for Iris data') \Sexpr{'@'} \end{verbatim} \end{block} \end{frame} %%%%%% \begin{frame}{A minimal working example} \onslide<1-> Step by step guide to creating your first RR document\\~\\ \begin{enumerate} \onslide<2-> \item Download and install \href{http://www.rstudio.com/}{RStudio} \onslide<3-> \item Dowload and install \href{http://miktex.org/}{MikTeX} if using Windows \onslide<4-> \item Create a unique folder for the document - This will be the working directory \onslide<5-> \item Open a new Sweave file in RStudio \onslide<6-> \item Copy and paste the file found on slide \ref{sweaveref} for Sweave or slide \ref{knitref} for Knitr into the new file (and select correct compile option) \onslide<7-> \item Compile the pdf (runs Sweave/Knitr, then pdfLatex)\\~\\ \end{enumerate} \onslide<7-> \centerline{\includegraphics[width=0.6\textwidth]{compile_ex.png}} \end{frame} %%%%%% \begin{frame}{If things go wrong...} \LaTeX\ Errors can be difficult to narrow down - check the log file\\~\\ Sweave/Knitr errors will be displayed on the console\\~\\ Other resources \begin{itemize} \item{`Reproducible Research with R and RStudio' by C. Garund, CRC Press} \item{\LaTeX forum (like StackOverflow) \href{http://www.latex-community.org/forum/}{http://www.latex-community.org/forum/}} \item Comprehensive Knitr guide \href{http://yihui.name/knitr/options}{http://yihui.name/knitr/options} \item Sweave user manual \href{http://stat.ethz.ch/R-manual/R-devel/library/utils/doc/Sweave.pdf}{http://stat.ethz.ch/R-manual/R-devel/library/utils/doc/Sweave.pdf} \item Intro to Sweave \href{http://www.math.ualberta.ca/~mlewis/links/the_joy_of_sweave_v1.pdf}{http://www.math.ualberta.ca/~mlewis/links/the_joy_of_sweave_v1.pdf} \end{itemize} \vspace{\baselineskip} \end{frame} \end{document}
To leave a comment for the author, please follow the link and comment on their blog: R is my friend » R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.