Site icon R-bloggers

Single-source publishing for R users

[This article was first published on Maëlle's R blog on Maëlle Salmon's personal website, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A big part of my work includes putting content about R online, in blog posts and online books. I’m therefore very interested in the technical infrastructure that allows us R users to produce beautiful products out of R Markdown or Markdown source files. In this post I shall summarize my recent experiments around making HTML and PDF versions of books. Thanks to Julie Blanc’s inspiring post in French, I have learnt this is called single-source publishing. So let’s dive into the incredible machines allowing single-source publishing for R users. ????

What I want for writing books

I want technical tools that allow me to procrastinate whilst learning a ton. Just kidding, or not. ???? My official criteria are:

Now, let’s dive into three workflows I’ve used or experimented with.

The standard way: bookdown gitbook and pdf_book

We R users are very lucky to be able to use bookdown for making books. You write content in Rmd files in a single folder, you order their filenames in a config file and choose an output format when knitting. There are many books out there built using bookdown, and often they’re knit twice so that there’s an HTML and PDF version. See for instance rOpenSci dev guide, that is deployed with GitHub Actions.

Advantages of bookdown include its widespread usage and stability, as well as its docs, both by its authors and users. For instance:

A downside of bookdown for the procrastinators out there is, in my opinion, that if you want a PDF and HTML versions of a book, you have to knit it twice. Furthermore, if you introduce some fancy things such as a custom block, you will have to define it both with CSS and with LaTeX environments. Now, I have written a whole book with LaTeX so I used to partially know how to use it, but these days I’d prefer learning about CSS only as there’s already a lot to digest in there. ????

So let’s dive into two DIY ways of publishing a book in HTML and PDF.

Hugo and pagedjs-cli

In this proof-of-concept, content is stored in Markdown files, that could have been generated with hugodown. Actors are Hugo, a powerful static website generator that you get in the form of a binary, and pagedjs-cli, a Node package that prints HTML to PDF.

You can find a repo corresponding to this experiment on GitHub, and the resulting book on Netlify.

This proof-of-concept is based on the great hugo-book theme. My “work” was just to add a bit of setup magic (Netlify config, config/ dir).

The idea is to run two Hugo builds with different configurations. In the words of Hadley Wickham, “hugo provides a bewildering array of possible config locations”. In this proof-of-concept/incredible machine, this array of possibilities is key to success. The two configurations use different layout directories, layout directories being the templates telling Hugo where to put your words, e.g. every blog post on its own page, and a page showing summaries of all of them.

Netlify can run all these commands

[build]
publish = "/public"
command = "hugo -d public --environment 'website' && hugo -d public2 --environment 'pdf' && pagedjs-cli public2/all/index.html -o public/book.pdf"

The dependency on pagedjs-cli is indicated with package.json (the DESCRIPTION of Node projects).

Now if the content lived in R Markdown files, we would need to use a service like GitHub Actions to knit things.

Advantages

Further work

If I decided to take this further, that is! All the work would be aimed at making the PDF version better

Related work

The work presented by Julie Blanc is a similar pipeline, where content lives in Markdown files rendered to two HTML versions by Jekyll, another static website generator. One HTML version is a website, the other one is a printable version thanks to Paged.js (the PDF is not pre-generated but you can get it from any modern browser). The big differences with my experiment are:

bookdown::bs4_book(), xml2 and pagedjs-cli

In another experiment, I used bookdown’s brand-new HTML book format, bs4_book() by Hadley Wickham (only available in bookdown’s dev version). I used this one because I find it looks better than gitbook, and because it uses Bootswatch themes, you get to use divs such as a alerts. See bookdown’s docs about custom blocks.

The source is open. See the HTML version and PDF version. This time I took time to learn a bit about print CSS. ????

Here the steps are knit using the bs4_book() template for HTML and then for getting a PDF

Locally I sourced build.R to see whether it works, but then I also wrote a GitHub Actions workflow that installs dependencies, runs the script, and deploys the docs/ folder to a gh-pages branch.

Note that Bootstrap provides classes you can add to elements for changing their display in print. So to not show the burger menu in print, you can either do as I did with a CSS property

@media print {
  .btn.btn-outline-primary.d-lg-none.ml-2.mt-1 {
    display: none;
}

}

or you’d add the .d-print-none to that element in your HTML template. If you can control said HTML template, that is.

Further work needed in this experiment is listed in GitHub issues. For instance footnotes aren’t tackled yet.

Conclusion

So, as an R user writing content in R Markdown, you have different possibilities for respecting the principles of single-source publishing. I would still recommend using bookdown or other official rmarkdown formats, knitting twice2 if you need two versions, as it is the only stable and widely used solution, but I found it fun to explore other workflows. Both the DIY ways I presented use pagedjs-cli to generate a PDF out of HTML. I strongly recommend following the work done by the lovely Paged.js folks in the world of paged media, and to learn more about CSS for print. If you want to produce only paged content out of your R Markdown file, check out the pagedown package and its fantastic output formats.

If you too want to experiment with homemade pipelines, have fun customizing things. ???? Now, of course, the possibility of customization might actually be a curse when you are trying to write something. But this was not a blog post about productivity. ????

At this point, let me thank Romain Lesur (one of pagedown authors) for his work promoting Paged.js in the R community. ????

< section class="footnotes" role="doc-endnotes">
  1. Yes, I should probably re-style this very website. ↩︎

  2. Potentially, knitr cache might make knitting a thing a second time very fast? ???? ↩︎

To leave a comment for the author, please follow the link and comment on their blog: Maëlle's R blog on Maëlle Salmon's personal website.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.