Site icon R-bloggers

Getting Started with R Markdown, knitr, and Rstudio 0.96

[This article was first published on Jeromy Anglim's Blog: Psychology and Statistics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post examines the features of R Markdown using knitr in Rstudio 0.96. This combination of tools provides an exciting improvement in usability for reproducible analysis. Specifically, this post (1) discusses getting started with R Markdown and knitr in Rstudio 0.96; (2) provides a basic example of producing console output and plots using R Markdown; (3) highlights several code chunk options such as caching and controlling how input and output is displayed; (4) demonstrates use of standard Markdown notation as well as the extended features of formulas and tables; and (5) discusses the implications of R Markdown. This post was produced with R Markdown. The source code is available here as a gist. The post may be most useful if the source code and displayed post are viewed side by side. In some instances, I include a copy of the R Markdown in the displayed HTML, but most of the time I assume you are reading the source and post side by side.

Getting started

To work with R Markdown, if necessary:

To run the basic working example that produced this blog post:

opts_knit$set(upload.fun = imgur_upload)  # upload all images to imgur.com

Prepare for analyses

set.seed(1234)
library(ggplot2)
library(lattice)

Basic console output

To insert an R code chunk, you can type it manually or just press Chunks - Insert chunks or use the shortcut key. This will produce the following code chunk:

```{r}

```

Pressing tab when inside the braces will bring up code chunk options.

The following R code chunk labelled basicconsole is as follows:

```{r basicconsole}
x <- 1:10
y <- round(rnorm(10, x, 1), 2)
df <- data.frame(x, y)
df
```

The code chunk input and output is then displayed as follows:

x <- 1:10
y <- round(rnorm(10, x, 1), 2)
df <- data.frame(x, y)
df

##     x    y
## 1   1 1.31
## 2   2 2.31
## 3   3 3.36
## 4   4 3.27
## 5   5 5.04
## 6   6 6.11
## 7   7 8.43
## 8   8 8.98
## 9   9 8.38
## 10 10 9.27

Plots

Images generated by knitr are saved in a figures folder. However, they also appear to be represented in the HTML output using a data URI scheme. This means that you can paste the HTML into a blog post or discussion forum and you don't have to worry about finding a place to store the images; they're embedded in the HTML.

Simple plot

Here is a basic plot using base graphics:

```{r simpleplot}
plot(x)
```

plot(x)

Note that unlike traditional Sweave, there is no need to write fig=TRUE.

Multiple plots

Also, unlike traditional Sweave, you can include multiple plots in one code chunk:

```{r multipleplots}
boxplot(1:10~rep(1:2,5))
plot(x, y)
```

boxplot(1:10 ~ rep(1:2, 5))

plot(x, y)

ggplot2 plot

Ggplot2 plots work well:

qplot(x, y, data = df)

lattice plot

As do lattice plots:

xyplot(y ~ x)

Note that unlike traditional Sweave, there is no need to print lattice plots directly.

R Code chunk features

Create Markdown code from R

The following code hides the command input (i.e., echo=FALSE), and outputs the content directly as code (i.e., results=asis, which is similar to results=tex in Sweave).

```{r dotpointprint, results='asis', echo=FALSE}
cat("Here are some dot points\n\n")
cat(paste("* The value of y[", 1:3, "] is ", y[1:3], sep="", collapse="\n"))
```

Here are some dot points

Create Markdown table code from R

```{r createtable, results='asis', echo=FALSE}
cat("x | y", "--- | ---", sep="\n")
cat(apply(df, 1, function(X) paste(X, collapse=" | ")), sep = "\n")
```
xy
11.31
22.31
33.36
43.27
55.04
66.11
78.43
88.98
98.38
109.27

Control output display

The folllowing code supresses display of R input commands (i.e., echo=FALSE) and removes any preceding text from console output (comment=""; the default is comment="##").

```{r echo=FALSE, comment="", echo=FALSE}
head(df)
```

  x    y
1 1 1.31
2 2 2.31
3 3 3.36
4 4 3.27
5 5 5.04
6 6 6.11

Control figure size

The following is an example of a smaller figure using fig.width and fig.height options.

```{r smallplot, fig.width=3, fig.height=3}
plot(x)
```

plot(x)

Cache analysis

Caching analyses is straightforward. Here's example code. On the first run on my computer, this took about 10 seconds. On subsequent runs, this code was not run.

If you want to rerun cached code chunks, just delete the contents of the cache folder

```{r longanalysis, cache=TRUE}
for (i in 1:5000) {
    lm((i+1)~i)
}
```

Basic markdown functionality

For those not familiar with standard Markdown, the following may be useful. See the source code for how to produce such points. However, RStudio does include a Markdown quick reference button that adequatly covers this material.

Dot Points

Simple dot points:

and numeric dot points:

  1. Number 1
  2. Number 2
  3. Number 3

and nested dot points:

Equations

Equations are included by using LaTeX notation and including them either between single dollar signs (inline equations) or double dollar signs (displayed equations). If you hang around the Q&A site CrossValidated you'll be familiar with this idea.

There are inline equations such as $y_i = \alpha + \beta x_i + e_i$.

And displayed formulas:

$$\frac{1}{1+\exp(-x)}$$

knitr provides self-contained HTML code that calls a Mathjax script to display formulas. However, in order to include the script in my blog posts I took the script and incorporated it into my blogger template. If you are viewing this post through syndication or an RSS reader, this may not work. You may need to view this post on my website.

Tables

Tables can be included using the following notation

ABC
1MaleBlue
2FemalePink

Hyperlinks

Images

Here's an example image:

Code

Here is Markdown R code chunk displayed as code:

```{r}
x <- 1:10
x
```

And then there's inline code such as x <- 1:10.

Quote

Let's quote some stuff:

To be, or not to be, that is the question: Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune,

Conclusion

Questions

The following are a few questions I encountered along the way that might interest others.

Annoying <br/>'s

Question: I asked on the Rstudio discussion site: Why does Markdown to HTML insert <br/> on new lines?

Answer: I just do a find and delete on this text for now. Specifically, I have a sed command that extracts just the content between the body tags and removes br tags. I can then, readily incorporate the result into my blogposts.

sed -i -e '1,/<body>/d' -e'/^<\/body>/,$d' -e 's/<br\/>$//' filename.html

Temporarily disable caching

Question: I asked on StackOverflow about How to set cache=FALSE for a knitr markdown document and override code chunk settings?

Answer: Delete the cache folder. But there are other possible workflows.

Equivalent of Sexpr

Question: I asked on Stack Overvlow about whether there an R Markdown equivalent to Sexpr in Sweave?.

Answer: Include the code between brackets of “backtick r space” and “backtick”. E.g., in the source code I have calculated 2 + 2 = 4 .

Image format

Question: When using the URI scheme images don't appear to display in RSS feeds of my blog. What's a good strategy?

Answer: One strategy is to upload to imgur. The following provides an example of exporting to imgur.

Add the following lines of code near the top of the file:

``` {r optsknit}
opts_knit$set(upload.fun = imgur_upload) # upload all images to imgur.com
```

I found that the function failed when I was at work behind a firewall, but worked at home.

To leave a comment for the author, please follow the link and comment on their blog: Jeromy Anglim's Blog: Psychology and Statistics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.