Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Since it seems to be the fashion, here’s a post about how I make my academic papers.
Actually, who am I trying to kid? This is also about how I make slides, letters, memos and “Back in 10 minutes” signs to pin on the door. Nevertheless it’s for making academic papers that I’m going to recommend this particular set of tools.
I use the word make deliberately because I’m thinking of ‘academic paper’ broadly, as the sum of its words, analyses, tables, figures, and data. In this sense, papers can contain their own replication materials and when they do it should be possible in a single movement to rerun a set of analyses and reconstruct the paper that reports them.
To get anywhere near that goal, I use a mix of latex, its newer bibliography system biblatex, and the R package knitr. Also, I use a Mac, though that won’t make very much difference to the exposition.
Here’s how this currently works…
Since I write in latex I have texlive installed, in the gargantuan but friendly form of Mactex. To actually write things I use the TeXShop editor that comes with it, but only after mapping the default to something non-proportional. (What were they thinking?)
My basic paper template starts like this
\documentclass[11pt,a4paper]{report} \usepackage[utf8]{inputenc} %% s \usepackage[charter]{mathdesign} \usepackage[scaled=.95]{inconsolata} %% page margins, inter-paragraph space and no chapters \usepackage[margin=1.1in]{geometry} \setlength{\parskip}{0.5em} \renewcommand{\thesection}{\arabic{section}} %% bibliography \usepackage[american]{babel} \usepackage{csquotes} \usepackage[style=apa,natbib=true,backend=biber]{biblatex} \DeclareLanguageMapping{american}{american-apa} \addbibresource{biblio.bib} %% for memisc \usepackage{booktabs} \usepackage{dcolumn} %% define a dark blue \usepackage{color} \definecolor{darkblue}{rgb}{0,0,.5} %% hyperlinks to references \usepackage{hyperref} \hypersetup{colorlinks=true, linkcolor=darkblue, citecolor=darkblue, filecolor=darkblue, urlcolor=darkblue} \author{Will Lowe\\Universität Mannheim \and Coauthor} \title{Because We Can: Studying Twitter in Political Science\thanks{Paper presented at some conference or other.}} \date{March 2013} \begin{document} \maketitle \begin{abstract} What the paper is all about \end{abstract}
Pretty vanilla stuff for a latex person, but still there are a few things to note:
Input encoding: UTF-8. Always, for everything. Paper, bibliography, data, and code. This file is in UTF-8 because I don’t want to live in the late 20th century any more, and I don’t want to have to get all {\”u} about the perfectly respectable (and in my part of the world ubiquitous) u-with-an-umlaut ü or its non-ASCII brethren.
Motto: If it’s common enough to get its own key on the keyboard then it’s not a candidate for an escape sequence.
Bibliography: I use biblatex, not bibtex, for very much the same reasons as I insist on UTF-8. Try it. You’ll like it. It’s better put together, behaves well with Unicode, and doesn’t require any changes in your .bib files.
If you happen to use Bibdesk (also bundled with Mactex) to edit your bibliography, you may want to add the extra biblatex fields like DOI, as described here.
Here I’ve loaded the excellent APA style. (All the lines in that block are required.) I’ve also switched on natbib emulation so I can use the good old citep, citet, etc. citation commands I grew up with, under the new biblatex regime.
Preparing for R: The booktabs and dcolumn package serve to style and digit-align latex tables respectively. They’re here so that the R package memisc that auto-generates all my tables can use them. Because nobody still writes data tables by hand. Do they?
Now it’s time to set up the R parts. I use knitr to embed R in documents and so should you. Think of it a non-cryptic Sweave that’s isn’t just a massive Perl script and always knows where its style files are. Here I set an important default in an uncached chunk:
<<set-options, echo=FALSE, cache=FALSE>>= opts_knit$set(stop_on_error=2L) @
Why? Because when your R code fails – and if you’re writing paper and code together it will fail at some point – then without further guidance knitr will just keep on trucking. This is not necessarily a good thing: Either the nasty error that replaces your desired output happens to compile as latex, in which case there is nothing to tell you that Figure 3, your pride, your joy, and the product of many hours getting your head into the ggplot2 zone is simply missing from the final pdf. Alternatively, it does not happen not to compile as latex, which will give you the mistaken impression there is something wrong with your document rather than with your code. Now, these are occupational hazards of mixing document and code, but since I’m doing just that I can at least ensure that the code stops when it’s broken. And that’s what this knitr option does.
Notice that, despite eschewing Sweave, the chunks of R are still wrapped in the aesthetically-challenged noweb syntax, bristling with angle brackets and at signs. Other syntax is possible with knitr, but it’s probably safest to stick with noweb. Also, it doesn’t confuse the old timers.
Speaking of whom… a couple of observations for those who already have lots of documents set up for Sweave. First, my sympathies – it must have been horrible. Second, be careful because Knitr’s option syntax is not quite the same as Sweave now it’s all going through R. Some of the changes are listed here where there’s also a function to turn the one into the other. Happily, you’ll find that knitr makes more sense.
Next I load some R packages. Here it’s the very handy memisc which I use this mostly for its wide-ranging toLatex function, and the aprtable package for typesetting regression tables.
<<loadpackages,include=FALSE>>= library(memisc) library(apsrtable) @
Here include=FALSE means that nothing that happens in this chunk will make it into the paper, including any start up messages or exciting news about which functions overwrite which other ones. (Thanks here to Matheiu from the comments).
If you find yourself wanting to suppress the output of some R functions but not others then wrap your noisier functions in suppressMessages.
It’s about time for a table. I like to use the document itself to control the formatting of the table, perhaps because I can never remember how to get ctable to do what I want, so my typical tables tend to look as follows, with the R code wedged into the middle:
\begin{table}[htdp] \caption{A fascinating table} \begin{center} <<tab-fascinating,results='asis',echo=FALSE>>= tab <- HairEyeColor # the data: a three way table toLatex(ftable(tab)) @ \end{center} \label{tab:mytable} \end{table}
Here, by the way, is one more reason to use memisc’s toLatex rather than xtable: memisc can typeset a flat table. It also restricts itself to returning a tabular environment and leaves the whole surrounding table business to me.
The results of this chunk are set to be ‘asis’ so that nothing untoward happens to the generated latex table code on the way into the document.
Similarly, my typical figure looks like this:
\begin{figure}[htbp] \begin{center} <<plot-fascinating,echo=FALSE>>= mosaicplot(HairEyeColor) @ \caption{A fascinating plot} \label{plot:fascinating} \end{center} \end{figure}
Unlike Sweave it’s not necessary to say that the code chunk is going to be a figure. Just make the plot and it will get inserted. By default it will take up the width of the text.
For my sins I find myself writing about regression models. Sometimes I cannot avoid having to show their coefficients in a big table. R packages for turning regression output from several models into nicely formatted latex tables include apsrtable, memisc, and stargazer. You can see an example in another post. Here’s an example using apsrtable and some random attitude data that comes with R:
\begin{table}[htbp] \caption{A fascinating regression table} \label{lm:fascinating} \begin{center} <<lm-fascinating,results='asis',echo=FALSE>>= m1 <- lm(rating ~ complaints + privileges + learning + raises + critical, data=attitude) m2 <- lm(rating ~ complaints + privileges + learning, data=attitude) apsrtable(m1, m2, Sweave=TRUE) @ \end{center} \end{table}
In this package ‘Sweave=TRUE’ ensures the regression tabular environment doesn’t get wrapped in its own table.
The last part of the document just pushes out the reference list and shuts up shop:
\printbibliography \end{document}
Save this document with suffix ‘.Rnw’ and it’s ready to go.
I mentioned that I write in TeXShop, which has the notion of compilation engines. For example, there’s one for ordinary latex that calls pdflatex, and one for XeLaTeX which uses that instead. Once defined, these engines all live on a button in the main interface. Compilation is then a matter of pressing it or remembering that Apple-T does the same thing.
There isn’t a built-in engine for knitr, but it’s easy to make one. The engine itself is just a shell script. Here’s my belt-and-braces version that believes you are on a unix machine but doubts that your paths are set up properly:
#!/bin/bash export PATH=$PATH:/usr/texbin:/usr/local/bin if (Rscript -e "library(knitr); knit('$1')") then latexmk -pdf "${1%.*}" fi
In brief, this tries to run the R code in double quotes on the first argument ($1) which is the name of the .Rnw file. If this succeeds then the transformation from latex+R to pure latex must have been successful, so we can call latexmk on the resulting file. latexmk runs latex, then biber, then latex, then latex again, then… until all the citations are cited, the contents are tabled, and all the cross references are happy again.
To get TeXShop to treat this file as an engine, save it as
~/Library/TeXShop/Engines/Knitr.engine
and don’t forget to make it executable.
My paper writing process then consists of writing words and code, and compiling intermittently to see where I am. When I’m happy with the result I can open up a R session and type
library(knitr) purl("myfile.Rnw")
and get the R code extracted from the surrounding paper in a called ‘myfile.R’. That, along with any files or data that are called in the course of the document, constitutes the replication materials.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.