Site icon R-bloggers

How to deal with OAuth2.0 in R packages?

[This article was first published on Posts on R-hub blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you see the web API you want to wrap uses OAuth2.0 rather than a simple API key for authentication, are you happy or do you get nervous? If you get nervous, this post is for you! We shall demystify OAuth2.0 a bit so that you might know how to smoothly and securely use OAuth2.0 in your package, and we shall provide links to useful resources.

What is OAuth 2.0?

OAuth 2.0 is the most recent standard of OAuth, coming after OAuth 1.0. But you were probably not looking for this kind of answer. ????

OAuth 2.0 is a way for a resource server (say, Twitter API) to grant access to (part of) your account to an app (say, a Twitter alternative app, or rtweet) on your behalf. It is safer than previous industry standards, see the video below.

YouTube Video

In practice, with OAuth 2.0 there is a dance i.e. a series of requests and redirects between the server and the third-party app, launched by you when starting to install an app, and validating by you as one of the steps is your clicking something like “Yes, I grant access to this and that right on my Twitter account, to the app Twitter Lite”. Aaron Parecki (featured in the video above) has a great write-up on simplified OAuth 2.0.

To use things on the server one needs an access token (a string, e.g. e46b7faf296e3f0624e6240a6efafe3dfb17b92ae0089c7e51952934b60749f2). The whole point of the OAuth 2.0 dance is to get such an access token. Often you will also get a refresh token (similarly a string, e.g. 3bdeb3bd19a3093674f4e7ba6e1be1706ab1f16af9ebf3b79a67a807c622b650). The refresh token allows to get issued a new access token without any interactivity i.e. no need for the user to click. A common validity period for an access token would be 24 hours, for a refresh token one year, and often the refresh token is re-usable.

Now while this all makes perfect sense when you are developing an actual third-party app, for interactive usage… when all you want is an R package that can be used in scripts running on a server, it’s much less clear.

How do you use OAuth 2.0 in R?

When writing an R package wrapping an API using OAuth 2.0 you’ll need the user to grant access to an “app”, which will allow to create an access token and a refresh token. The access token will then often be passed to the API in a header when making requests, whilst the refresh token would be posted in a query string when the access token needs to be renewed.

Your problem is: how do I imitate a third-party app? Thankfully for you, in most cases the complexity can be handled by the httr package. For other cases, or if you want to e.g. only use curl, you will have to get creative. ????

How does the OAuth 2.0 dance and usage happens with httr?

OAuth endpoint

How do you know what URLs to feed httr::oauth_endpoint()? By reading the API docs and looking at examples. ???? E.g. compare ORCID API docs and rorcid source code.

OAuth app

In an R package, if possible (depending on the API rules), it is nice to have a built-in OAuth app for quick experiments by the users; and the way for users to specify their own app’s information (Bring Your Own App).

In the gargle docs there is a nice discussion of the security risk of a built-in OAuth app, by Jenny Bryan:

"Package maintainers might want to build this app in as a fallback, possibly taking some measures to obfuscate the client ID and secret and limit its use to your package.

  • Note that three-legged OAuth always requires the involvement of a user, so the word “secret” here can be somewhat confusing. It is not a secret in the same sense as a password or token. But you probably still want to store it in an opaque way, so that someone else cannot easily “borrow” it and present an OAuth consent screen that impersonates your package."

OAuth token creation and caching

Interestingly there is still no app involved but httr starts a web server using httpuv. This way when the server sends the access token to http://localhost:1410/, httr is listening and gets it.

Interactively, if no parameter / option is set for the cache argument of httr::oauth2.0_token(), the user will be asked by httr whether they want to cache the token to a file. As a package developer, ideally you’d surface options in your function setting up authentication with these concerns in mind:

In rtweet the token is by default saved to the home directory and the path to it is saved as an environment variable. More easily, you could save the token to an app directory.

OAuth usage

What are your OAuth 2.0 secret credentials?

Your secrets are

Secret OAuth 2.0 httr token file

Often as you are not storing HTTP requests you will only need to worry about the file.

If you need to use your OAuth 2.0 R package / script on a server, you will need a cached httr token file. If your server content (GitHub repo?) is public, you will have to encrypt the file and to store a text-version of the encryption key/passwords as e.g. GitHub repo secret if you use GitHub Actions. Read the documentation of the continuous integration / cloud service you are using to find out how secrets are protected and how you can use them in your builds.

See gargle vignette about securely managing tokens. gargle is directly useful if you are wrapping a Google API, but reading this vignette is useful no matter what API you are wrapping, for the clear explanations of challenges and solutions.

The approach is:

< !-- -->
GIVEN_SERVICE_OAUTH_PWD="e46b7faf296e3f0624e6240a6efafe3dfb17b92ae0089c7e51952934b60749f2"

Example of a script creating and encrypting an OAuth token (for the Meetup API).

token_path <- testthat::test_path(".meetup_token.rds") # or just token_path <- ".meetup_token.rds"
use_build_ignore(token_path)
use_git_ignore(token_path)

# the function below calls httr::oauth2.0_token()
meetupr::meetup_auth(
  token = NULL,
  cache = TRUE,
  set_renv = FALSE,
  token_path = token_path
)

# sodium_key <- sodium::keygen()
# create the environment variable "GIVEN_SERVICE_OAUTH_PWD" and set it to sodium::bin2hex(sodium_key))

# get key from environment variable
key <- cyphr::key_sodium(sodium::hex2bin(Sys.getenv("GIVEN_SERVICE_OAUTH_PWD")))
cyphr::encrypt_file(
  token_path,
  key = key,
  dest = testthat::test_path("secret.rds")
)
key <- cyphr::key_sodium(sodium::hex2bin(Sys.getenv("GIVEN_SERVICE_OAUTH_PWD")))
temptoken <- tempfile(fileext = ".rds")
cyphr::decrypt_file(
  testthat::test_path("secret.rds"),
  key = key,
  dest = temptoken
)

All of this only works if your code automatically renews the access token, and if the refresh token is re-usable and valid for a long enough time. After one year or so, depending on the API, you will have to re-create your httr token file because the refresh token will have expired.

Secret access token, secret refresh token

Now if you are storing your HTTP interactions to disk e.g. if using the vcr package for HTTP testing, you will need to know how the secrets are used in HTTP requests. Often,

Make sure you never publish a file with unedited tokens! E.g. for vcr read the security docs.

How do you test OAuth 2.0 with R?

If you want to test how your package handles different OAuth 2.0 scenarios, you might be interested in the OAuth 2.0 apps provided by the webfakes package.

Conclusion

In this post we offered a round-up around OAuth 2.0. Useful tools we mentioned are httr’s OAuth2.0 functions, and the cyphr package for human-friendly encryption workflows. We hope it helped clarify this tool that’s not very natural for R developers. Please share any further tips or examples you like in the comments below!

Many thanks to Frie Preu for providing useful feedback on this post!

To leave a comment for the author, please follow the link and comment on their blog: Posts on R-hub blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.