Site icon R-bloggers

R scripts as command line tools

[This article was first published on r-bloggers on mpjdem.xyz, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Most R users rely heavily on the interactive console for writing and executing code, but sometimes you will want to expose your work to a world outside of that cosy cocoon. One solution is to wrap a web API around your code, for instance using the excellent {plumber}. The other main option is to wrap it in a command-line interface (CLI), so it can be used from the shell like any regular program.

To run R code directly from a Linux shell, we must use the Rscript executable instead of regular R (or alternatively, {littler}). For instance, suppose we want to expose the functionality of the {emo} package – retrieve an emoji by name.

# To install {emo}, do: remotes::install_github("hadley/emo")
# commandArgs() parses arguments provided on the command line into a <list>
args <- commandArgs(trailingOnly = TRUE)

# Use cat() for output
cat(emo::ji(args[[1]]), "\n")

Save this code to getemo.R, and then this is what Rscript enables us to do:

Rscript getemo.R unicorn

(the character encoding and used by your terminal must support rendering unicode emojis)

So far so good, and in many cases this will even be good enough! But our little script does not yet behave like a regular command-line tool, where we would hope to just be able to do from any directory:

getemo unicorn

So how?

First let me give a hat tip to Colin Fay for describing a method where you let the Node package manager, npm, do all the work. This is really great when you are already a Node user, but if you’re not it is probably too much overhead simply to expose an R script.

Three simple steps are all we really need.

1. Add a shebang line at the top

Linux systems (and other Unix-likes) have a standard way of specifying in a comment at the top of plain-text scripts which program should be used to run them – the shebang. You could use:

#!/usr/bin/Rscript --vanilla

The --vanilla argument makes sure that user-specific R settings are ignored, and that there is no saving or restoring of workspaces. This makes the script more portable to other systems.

You will often see /usr/bin/env Rscript used, also for reasons of portability. But there are some potential version differences if you want to specify the --vanilla argument, so if you’re unsure just stick to the direct version for now.

2. Make the script executable

Before you can run a script directly from the command line, you must tell the file system that this plain text file is indeed a script which can be run, rather than any old text file, and that this user is allowed to run it. So do:

chmod +x getemo.R

You will be able to check with a simple ls -all getemo.R command that it has received x’s in its permissions. It can now be run with:

./getemo.R unicorn

We’re getting there, but we still needed to specify the path to the script explicitly to be able to run it (in this case, the local directory .).

3. Make it available in your $PATH

Now for the ‘from any directory’ part of the requirements. Running a program from any directory is typically achieved by adding the directory of that program to the $PATH environment variable. But we don’t want to be littering this with custom script directories so let’s see what is already in $PATH, using echo $PATH. Either ~/bin or ~/.local/bin, or both, are often present in desktop linux installations; these are standard directories for executables belonging to your home directory. If you ls -all them you will notice they mainly contain symbolic links to files in other directories.

This is exactly what we are going to do with our R script as well. In addition, let’s drop the .R extension when making that -s link. So, assuming that ~/bin/ is in our $PATH:

ln -s -r getemo.R ~/bin/getemo

The -r option makes the link between the paths relative, since I don’t know in which directory the original script would be placed on your computer. You can omit it but then you should specify both paths in full.

Now we can get the desired result by executing, from any directory:

getemo unicorn

The full script, including the shebang, can be found in this gist.

Do more

We didn’t make the CLI tool available to the entire system here, only to your user profile. Unless you have good reasons to the contrary, I would advise to keep it that way. Many Linux desktops have in practice only one user, and inside Docker containers you can use the /root user. If you do want to expose a script system-wide, /usr/local/bin is usually the appropriate directory.

To learn how to write better CLI tools in R, have a look at Mark Sellor’s blog on the topic. For parsing and documenting CLI arguments, I personally prefer {argparse} if the Python dependency is not an issue; and {docopt} when it is. For stylising the visual outputs, {cli} is great.

Cool, but why?

Well here’s a question people should ask more often! Some of the reasons I can see are:

  • Make your work available to non-R users in an interface they can understand
  • Make your work available to production frameworks that run tasks as commands, like Airflow
  • For your own convenience, expose R-specific functionality to the command line. emo::ji() was arguably not the world’s greatest example of a productivity improvement, but what about wrapping something like {skimr}, to beautifully preview CSVs?
  • R might not always be the objectively best tool for the job but it could well be your best tool. If R is indeed the language you are most fluent in, it will probably be most productive for you to script even non-data-related tasks in R.
  • If written properly, an R command line tool can be used together with other, non-R command line tools. For instance to provide data to it, or to further process the output. But that’s a topic for another time!

Cool.

I know!

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers on mpjdem.xyz.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.