Site icon R-bloggers

How to store and use webservice keys and authentication details with R

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Andrie de Vries (@RevoAndrie)

I frequently get asked the question how you can safely store login details and passwords for use by R, without exposing these details in your script.  Yesterday Jennifer Bryan asked this question on twitter and a small storm of views and tweets erupted.

Do we have any sort of consensus whether user’s API keys or app id/secrets should be handled via .Rprofile or .Renviron? #rstats

— Jenny Bryan (@JennyBryan) November 23, 2015

A few minutes later she tweeted that there clearly is no consensus:

Answer: NO, apparently we have no consensus. #rstats pkgs that wrap APIs are unique snowflakes ❄️. https://t.co/IjV6FarApq

— Jenny Bryan (@JennyBryan) November 24, 2015

Different options

Reading the twitter conversation, it seems to me there are several approaches. You can store your keys:

  1. Directly inside your script.
  2. In a file in your project folder, that you don’t share.
  3. In a .Rprofile file
  4. In a .REnviron file
  5. Store the keys in a json file
  6. In a secure key store that you access from R

Let’s look at the key idea and benefits (or disadvantages) of each approach:

1. Directly inside your script

The first approach is to simply store your keys directly in your script.

id <- "my login name"
pw <- "my password"
 
call_service(id, pw, ...)

Although simple, nobody seriously proposes this, for the obvious downside that it becomes impossible to share your code without also sharing your keys.

2. In a file in your project folder, that you don’t share.

The second option is almost just as easy to do. The idea is that you put your keys into an R script file in the same project folder, e.g. “keys.R”.  You then read the keys using, for example, source().

The idea is that you then exclude the “keys.R” file from any source control system.  With git, for example, you can add “keys.R” to your .gitignore settings.

The downside is that you can easily mistakenly share this file if you’re not careful.

# keys.R
id <- "my login name"
pw <- "my password"
 
# script.R
source("keys.R")
call_service(id, pw, ...)

3. In a .Rprofile file

The third option is to store the keys in one of your .Rprofile files (I wrote about this in a previous blog post “Best practices for handling packages in R projects“).

This option was very popular in the twitter discussion, because:

  • You can store the keys in your home folder, i.e. outside the project folder. This makes it less likely that you accidentally share your keys.
  • You can write standard R code in the .Rprofile
# ~/.Rprofile
id <- "my login name"
pw <- "my password"
 
# script.R
 
# id and pw are defined in the script by virtue of .Rprofile
call_service(id, pw, ...)

One downside of defining the objects “id” and “pw” directly inside your .Rprofile is that these objects then live in your global environment. If these objects are in the global environment, they can easily be changed by your script. For example, using rm() to clear your global environment will make these objects disappear.

A slightly more robust variation on the theme is to still use .Rprofile, but to declare your keys as environment variables.  You can use Sys.setenv() to set environment variables, and Sys.getenv() to read these:

# ~/.Rprofile
Sys.setenv(id = "my login name")
Sys.setenv(pw = "my password")
 
# script.R
 
# id and pw are defined in the script by virtue of .Rprofile
call_service(id = Sys.getenv("id"), pw = Sys.getenv("pw"), ...)

4. In a .Renviron file

Actually, R also has a mechanism to define environment variables in an external file called .Renviron. The loading of .Renviron is analogous to .Rprofile. The big difference is that you can in .Renviron you can define the variables directly, without using Sys.setenv().

As Hadley Wickham points out, environment variables are language agnostic:

@JennyBryan I like env vars because it works across programming languages

— Hadley Wickham (@hadleywickham) November 23, 2015

You can find very detailed instructions and recommendations in one of the vignettes of the httr package. View the vignette Best practices for writing an API package and navigate to “Appendix: API key best practices”. 

# ~/.Renviron
id = "my login name"
pw = "my password"
 
# script.R
 
# id and pw are defined in the script by virtue of .Rprofile
call_service(id = Sys.getenv("id"), pw = Sys.getenv("pw"), ...)

5. Store the keys in a json or yaml file

The json file format is increasingly the format of choice to communicate with webservices.  As a result, most modern languages can easily parse json files. The same idea goes for yaml files.

So, if you want to store your keys in a file format that can easily be consumed by other languages, e.g. Python, then json might be a good idea.

# keys.json
{
  "id":["my login name"],
  "pw":["my password"]
} 
 
# script.R
library(jsonlite)
call_service(id = fromJSON("keys.json")$id, 
             pw = fromJSON("keys.json")$pw, ...)

6. In a secure key store that you access from R

One big downside of all the previous approaches is that, in every case, you are storing your keys in an unencrypted format somewhere on your file system.

You probably already use a password storage tool, e.g. keychain or LastPass.

Unfortunately I am not aware of R interfaces to any of these key chains.  If you know of a good solution, please leave a comment!

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.