Reproducible data science with Nix, part 5 — Reproducible literate programming with Nix and Quarto

[This article was first published on Econometrics and Free Software, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This blog post is a copy-paste from this vignette

Introduction

This vignette will walk you through setting up a development environment with {rix} that can be used to compile Quarto documents into PDFs. We are going to use the Quarto template for the JSS to illustrate the process. The first section will show a simple way of achieving this, which will also be ideal for interactive development (writing the doc). The second section will discuss a way to build the document in a completely reproducible manner once it’s done.

Starting with the basics (simple but not entirely reproducible)

This approach will not be the most optimal, but it will be the simplest. We will start by building a development environment with all our dependencies, and we can then use it to compile our document interactively. But this approach is not quite reproducible and requires manual actions. In the next section we will show you to build a 100% reproducible document in a single command.

Since we need both the {quarto} R package as well as the quarto engine, we add both of them to the r_pkgs and system_pkgs of arguments of {rix}. Because we want to compile a PDF, we also need to have texlive installed, as well as some LaTeX packages. For this, we use the tex_pkgs argument:

library(rix)

path_default_nix <- tempdir()

rix(r_ver = "4.3.1",
    r_pkgs = c("quarto"),
    system_pkgs = "quarto",
    tex_pkgs = c("amsmath"),
    ide = "other",
    shell_hook = "",
    project_path = path_default_nix,
    overwrite = TRUE,
    print = TRUE)
## # This file was generated by the {rix} R package v0.3.1 on 2023-09-15
## # with following call:
## # >rix(r_ver = "976fa3369d722e76f37c77493d99829540d43845",
## #  > r_pkgs = c("quarto"),
## #  > system_pkgs = "quarto",
## #  > tex_pkgs = c("amsmath"),
## #  > ide = "other",
## #  > project_path = path_default_nix,
## #  > overwrite = TRUE,
## #  > print = TRUE,
## #  > shell_hook = "")
## # It uses nixpkgs' revision 976fa3369d722e76f37c77493d99829540d43845 for reproducibility purposes
## # which will install R version 4.3.1
## # Report any issues to https://github.com/b-rodrigues/rix
## let
##  pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/976fa3369d722e76f37c77493d99829540d43845.tar.gz") {};
##  rpkgs = builtins.attrValues {
##   inherit (pkgs.rPackages) quarto;
## };
##   tex = (pkgs.texlive.combine {
##   inherit (pkgs.texlive) scheme-small amsmath;
## });
##  system_packages = builtins.attrValues {
##   inherit (pkgs) R quarto;
## };
##   in
##   pkgs.mkShell {
##     buildInputs = [  rpkgs tex system_packages  ];
##       shellHook = ''
## 
## '';
##   }

(Save these lines into a script called build_env.R for instance, and run the script into a new folder made for this project.)

By default, {rix} will install the “small” version of the texlive distribution available on Nix. To see which texlive packages get installed with this small version, you can click here. We start by adding the amsmath package then build the environment using:

nix_build()

Then, drop into the Nix shell with nix-shell, and run quarto add quarto-journals/jss. This will install the template linked above. Then, in the folder that contains build_env.R, the generated default.nix and result download the following files from here:

  • article-visualization.pdf
  • bibliography.bib
  • template.qmd

and try to compile template.qmd by running:

quarto render template.qmd --to jss-pdf

You should get the following error message:

Quitting from lines 99-101 [unnamed-chunk-1] (template.qmd)
Error in `find.package()`:
! there is no package called 'MASS'
Backtrace:
 1. utils::data("quine", package = "MASS")
 2. base::find.package(package, lib.loc, verbose = verbose)
Execution halted

So there’s an R chunk in template.qmd that uses the {MASS} package. Change build_env.R to generate a new default.nix file that will now add {MASS} to the environment when built:

rix(r_ver = "4.3.1",
    r_pkgs = c("quarto", "MASS"),
    system_pkgs = "quarto",
    tex_pkgs = c("amsmath"),
    ide = "other",
    shell_hook = "",
    project_path = path_default_nix,
    overwrite = TRUE,
    print = TRUE)
## # This file was generated by the {rix} R package v0.3.1 on 2023-09-15
## # with following call:
## # >rix(r_ver = "976fa3369d722e76f37c77493d99829540d43845",
## #  > r_pkgs = c("quarto",
## #  > "MASS"),
## #  > system_pkgs = "quarto",
## #  > tex_pkgs = c("amsmath"),
## #  > ide = "other",
## #  > project_path = path_default_nix,
## #  > overwrite = TRUE,
## #  > print = TRUE,
## #  > shell_hook = "")
## # It uses nixpkgs' revision 976fa3369d722e76f37c77493d99829540d43845 for reproducibility purposes
## # which will install R version 4.3.1
## # Report any issues to https://github.com/b-rodrigues/rix
## let
##  pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/976fa3369d722e76f37c77493d99829540d43845.tar.gz") {};
##  rpkgs = builtins.attrValues {
##   inherit (pkgs.rPackages) quarto MASS;
## };
##   tex = (pkgs.texlive.combine {
##   inherit (pkgs.texlive) scheme-small amsmath;
## });
##  system_packages = builtins.attrValues {
##   inherit (pkgs) R quarto;
## };
##   in
##   pkgs.mkShell {
##     buildInputs = [  rpkgs tex system_packages  ];
##       shellHook = ''
## 
## '';
##   }

Trying to compile the document results now in another error message:

compilation failed- no matching packages
LaTeX Error: File `orcidlink.sty' not found

This means that the LaTeX orcidlink package is missing, and we can solve the problem by adding "orcidlink" to the list of tex_pkgs. Rebuild the environment and try again to compile the template. Trying again yields a new error:

compilation failed- no matching packages
LaTeX Error: File `tcolorbox.sty' not found.

Just as before, add the tcolorbox package to the list of tex_pkgs. You will need to do this several times for some other packages. There is unfortunately no easier way to list the dependencies and requirements of a LaTeX document.

This is what the final script to build the environment looks like:

rix(r_ver = "4.3.1",
    r_pkgs = c("quarto", "MASS"),
    system_pkgs = "quarto",
    tex_pkgs = c(
      "amsmath",
      "environ",
      "fontawesome5",
      "orcidlink",
      "pdfcol",
      "tcolorbox",
      "tikzfill"
    ),
    ide = "other",
    shell_hook = "",
    project_path = path_default_nix,
    overwrite = TRUE,
    print = TRUE)
## # This file was generated by the {rix} R package v0.3.1 on 2023-09-15
## # with following call:
## # >rix(r_ver = "976fa3369d722e76f37c77493d99829540d43845",
## #  > r_pkgs = c("quarto",
## #  > "MASS"),
## #  > system_pkgs = "quarto",
## #  > tex_pkgs = c("amsmath",
## #  > "environ",
## #  > "fontawesome5",
## #  > "orcidlink",
## #  > "pdfcol",
## #  > "tcolorbox",
## #  > "tikzfill"),
## #  > ide = "other",
## #  > project_path = path_default_nix,
## #  > overwrite = TRUE,
## #  > print = TRUE,
## #  > shell_hook = "")
## # It uses nixpkgs' revision 976fa3369d722e76f37c77493d99829540d43845 for reproducibility purposes
## # which will install R version 4.3.1
## # Report any issues to https://github.com/b-rodrigues/rix
## let
##  pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/976fa3369d722e76f37c77493d99829540d43845.tar.gz") {};
##  rpkgs = builtins.attrValues {
##   inherit (pkgs.rPackages) quarto MASS;
## };
##   tex = (pkgs.texlive.combine {
##   inherit (pkgs.texlive) scheme-small amsmath environ fontawesome5 orcidlink pdfcol tcolorbox tikzfill;
## });
##  system_packages = builtins.attrValues {
##   inherit (pkgs) R quarto;
## };
##   in
##   pkgs.mkShell {
##     buildInputs = [  rpkgs tex system_packages  ];
##       shellHook = ''
## 
## '';
##   }

The template will now compile with this environment. To look for a LaTeX package, you can use the search engine on CTAN.

As stated in the beginning of this section, this approach is not the most optimal, but it has its merits, especially if you’re still working on the document. Once the environment is set up, you can simply work on the doc and compile it as needed using quarto render. In the next section, we will explain how to build a 100% reproducible document.

100% reproducible literate programming

Let’s not forget that Nix is not just a package manager, but also a programming language. The default.nix files that {rix} generates are written in this language, which was made entirely for the purpose of building software. If you are not a developer, you may not realise it but the process of compiling a Quarto or LaTeX document is very similar to the process of building any piece of software. So we can use Nix to compile a document in a completely reproducible environment.

First, let’s fork the repo that contains the Quarto template we need. We will fork this repo. This repo contains the template.qmd file that we can change (which is why we fork it, in practice we would replace this template.qmd by our own, finished, source .qmd file). Now we need to change our default.nix:

let
 pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/976fa3369d722e76f37c77493d99829540d43845.tar.gz") {};
 rpkgs = builtins.attrValues {
   inherit (pkgs.rPackages) quarto MASS;
 };
 tex = (pkgs.texlive.combine {
   inherit (pkgs.texlive) scheme-small amsmath environ fontawesome5 orcidlink pdfcol tcolorbox tikzfill;
 });
 system_packages = builtins.attrValues {
   inherit (pkgs) R quarto;
 };
 in
 pkgs.mkShell {
   buildInputs = [  rpkgs tex system_packages  ];
 }

to the following:

let
 pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/976fa3369d722e76f37c77493d99829540d43845.tar.gz") {};
 rpkgs = builtins.attrValues {
  inherit (pkgs.rPackages) quarto MASS;
 };
 tex = (pkgs.texlive.combine {
  inherit (pkgs.texlive) scheme-small amsmath environ fontawesome5 orcidlink pdfcol tcolorbox tikzfill;
 });
 system_packages = builtins.attrValues {
  inherit (pkgs) R quarto;
 };
 in
 pkgs.stdenv.mkDerivation {
   name = "my-paper";
   src = pkgs.fetchgit {
       url = "https://github.com/b-rodrigues/my_paper/";
       branchName = "main";
       rev = "715e9f007d104c23763cebaf03782b8e80cb5445";
       sha256 = "sha256-e8Xg7nJookKoIfiJVTGoJkvCuFNTT83YZ6SK3GT2T8g=";
     };
   buildInputs = [  rpkgs tex system_packages  ];
   buildPhase =
     ''
     # Deno needs to add stuff to $HOME/.cache
     # so we give it a home to do this
     mkdir home
     export HOME=$PWD/home
     quarto add --no-prompt $src
     quarto render $PWD/template.qmd --to jss-pdf
     '';
   installPhase =
     ''
     mkdir -p $out
     cp template.pdf $out/
     '';
 }

So we changed the second part of the file, we’re not building a shell anymore using mkShell, but a derivation. Derivation is Nix jargon for package, or software. So what is our derivation? First, we clone the repo we forked just before (I forked the repository and called it my_paper):

pkgs.stdenv.mkDerivation {
  name = "my-paper";
  src = pkgs.fetchgit {
      url = "https://github.com/b-rodrigues/my_paper/";
      branchName = "main";
      rev = "715e9f007d104c23763cebaf03782b8e80cb5445";
      sha256 = "sha256-e8Xg7nJookKoIfiJVTGoJkvCuFNTT83YZ6SK3GT2T8g=";
    };

This repo contains our quarto template, and because we’re using a specific commit, we will always use exactly this release of the template for our document. This is in contrast to before where we used quarto add quarto-journals/jss to install the template. Doing this interactively makes our project not reproducible because if we compile our Quarto doc today, we would be using the template as it is today, but if we compile the document in 6 months, then we would be using the template as it would be in 6 months (I should say that it is possible to install specific releases of Quarto templates using following notation: quarto add quarto-journals/[email protected] so this problem can be mitigated).

The next part of the file contains following lines:

buildInputs = [  rpkgs tex system_packages  ];
buildPhase =
  ''
  # Deno needs to add stuff to $HOME/.cache
  # so we give it a home to do this
  mkdir home
  export HOME=$PWD/home
  quarto add --no-prompt $src
  quarto render $PWD/template.qmd --to jss-pdf
  '';

The buildInputs are the same as before. What’s new is the buildPhase. This is actually the part in which the document gets compiled. The first step is to create a home directory. This is because Quarto needs to save the template we want to use in /home/.cache/deno. If you’re using quarto interactively, that’s not an issue, since your home directory will be used. But with Nix, things are different, so we need to create an empty directory and specify this as the home. This is what these two lines do:

mkdir home
export HOME=$PWD/home

($PWD —Print Working Directory— is a shell variable referring to the current working directory.)

Now, we need to install the template that we cloned from Github. For this we can use quarto add just as before, but instead of installing it directly from Github, we install it from the repository that we cloned. We also add the --no-prompt flag so that the template gets installed without asking us for confirmation. This is similar to how when building a Docker image, we don’t want any interactive prompt to show up, or else the process will get stuck. $src refers to the path of our downloaded Github repository. Finally we can compile the document:

quarto render $PWD/template.qmd --to jss-pdf

This will compile the template.qmd (our finished paper). Finally, there’s the installPhase:

installPhase =
  ''
  mkdir -p $out
  cp template.pdf $out/
  '';

$out is a shell variable defined inside the build environment and refers to the path, so we can use it to create a directory that will contain our output (the compiled PDF file). So we use mkdir -p to recursively create all the directory structure, and then copy the compiled document to $out/. We can now build our document by running nix_build(). Now, you may be confused by the fact that you won’t see the PDF in your working directory. But remember that software built by Nix will always be stored in the Nix store, so our PDF is also in the store, since this is what we built. To find it, run:

readlink result

which will show the path to the PDF. You could use this to open the PDF in your PDF viewer application (on Linux at least):

xdg-open $(readlink result)/template.pdf

Conclusion

This vignette showed two approaches, both have their merits: the first approach that is more interactive is useful while writing the document. You get access to a shell and can work on the document and compile it quickly. The second approach is more useful once the document is ready and you want to have a way of quickly rebuilding it for reproducibility purposes. This approach should also be quite useful in a CI/CD environment.

Hope you enjoyed! If you found this blog post useful, you might want to follow me on Mastodon or twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebooks. You can also watch my videos on youtube. So much content for you to consoom!

Buy me an EspressoBuy me an Espresso

To leave a comment for the author, please follow the link and comment on their blog: Econometrics and Free Software.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)