Generating an academic CV with R and YAML

James Keirstead

7 years ago

[This article was first published on James Keirstead » Rstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For the past couple years, I’ve been using Kieran Healy’s lovely template for my academic CV. Kieran’s code is a customised *.tex file which, of course, has the virtue of simplicity. All a person needs to do is update it with glorious achievements from time to time and re-compile; this is exactly what I’ve been doing since adopting the template in 2011.

But since that time, I’ve had a little niggle in the back of my mind, a concern that despite the elegance of the typography, the underlying software design left something to be desired. If you are manually updating a TeX file with your vita information, then how do you deal with these use cases?

Generating a one or two-page short CV for a research proposal.
Generating a stand-alone publications list with a complete different format from what’s on your CV.
Synchronising your publications database between your reference manager and CV.
Recording data about some CV items in greater detail than you would want to display on your CV. For example, you might give a talk and want to remember who invited you, but that’s not really relevant to your vita.

All of this suggests that there should be a separation of content and style, just like a webpage. A full PDF is only one way to display CV data and I wanted something that would be easy to maintain, yet would allow me to generate new output formats quickly if necessary.

So I have developed a system for storing CV data, including academic publications, and generating custom outputs based on a configuration file. Although there are several moving parts, the basic design is simple.

Publications data is stored in a standard *.bib file. I get mine by exporting the “My Publications” folder from Mendeley but I’m sure similar things exist for other reference management tools.
Other CV data – such as teaching duties, talks, service to the profession and so on – is stored in YAML files. YAML has the virtue that it’s plain text so it’s easy to edit and version control, but it also doesn’t force a strict data structure on you. This means that it’s easy to add new sections with arbitrary properties. As an example, here’s an excerpt from my teaching file:

Screenshot of YAML file describing my teaching roles
A YAML file that specifies how to build the overall CV. This is the file that you can edit to omit parts of your full CV database. For example, only publications listed by their BibTeX keys in this master file will be included in the output.
A LaTeX package that contains all of the formatting information. This takes advantage of Biblatex to do the publication formatting; many thanks to Rob Hyndman for showing the way.
A set of R scripts that generates the CV. There is a set of functions to convert the section YAML files into the appropriate plain-text output; in this case, that means LaTeX but it could of course be HTML, Markdown, or whatever you like. The main function generate_cv(content, style, outdir) then builds the CV source file from a specified content file (the master YAML file described above), the style file (again here a LaTeX package but could be CSS for an HTML case), and dumps it into an output directory. The current script also generates version control information, compiles the whole thing, and then deletes everything but the resulting PDF.

If you want to try to manage your CV with the same approach, then all of the code is available on my Github page. I’m aware that this isn’t really a standalone R package because it includes my specific data but I think that that’s useful at this point in the software’s development, because you can see how everything fits together. I could of course have made up some dummy people, and if I was to submit the package to CRAN, then I probably would. But at the moment, that’s a poor use of time and the relevant personal data is in the public domain anyway.

A final point: why R? The short answer is that it’s what I know best and so I could whip up this prototype relatively quickly. But when you look through the code, you’ll see that it takes advantage of R’s list manipulation functions. This makes it dead easy to parse YAML (with the yaml library) and then process the resulting lists to give the desired output. The paste(..., collapse) function was very handy too, for example, in formatting all of the journals that I’ve done reviews for. I’m sure others could do something similar in Python or whatever other language, but in this case, R just worked.

To leave a comment for the author, please follow the link and comment on their blog: James Keirstead » Rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.