Site icon R-bloggers

Retries in API packages and reinventing the wheel

[This article was first published on Posts on R-hub blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Web APIs can sometimes fail for no particular reason; therefore packages accessing them often add some robustness to their code by retrying calling the API a few times if there was an error. The two high-level R HTTP clients, httr and crul, offer ready-made sub-routines for such cases, but some developers like me have rolled their own out of ignorance. ???? In this post I shall present the retry sub-routines of httr and crul, and more generally reflect on (not) reinventing the wheel in your R package. ????

The few figures of this post come from the funny HTTP Cats website and are hyperlinked.

< !--html_preserve-->

< !--/html_preserve-->

Retry in httr and crul

Relying on internet resources might make a package fragile, since the connection or interfaced web API can fail. Therefore, in packages wrapping APIs, one can find some variation of the following pseudo-code that retries a few times:

maxtry <- 5
try <- 1
resp <- do_an_internet_thing()
while (try <= maxtry && resp$status >= 400) {
  resp <- do_an_internet_thing()
  try <- try + 1
  Sys.sleep(some_waiting_time_increasing_with_try(try))
}

A search on the R-hub’s CRAN source code mirror e.g. surfaces such a function in a package.

As underlined in httr’s excellent “Best practices for API packages” vignette, “it’s extremely important to make sure to do this with some form of exponential backoff: if something’s wrong on the server-side, hammering the server with retries may make things worse, and may lead to you exhausting quota (or hitting other sorts of rate limits).”

Now, if you need such a pattern in your API package, you could use a shortcut rather than patiently ingesting examples and best practice… by using ready-made features of either httr or crul.

Retry in httr

The httr package contains a handy RETRY() function that, well, safely retries a request until it succeeds or until the maximal number of tries is reached. It uses best practice written up by AWS to define the increasing waiting time.

If there’s no error, it simply behaves like the corresponding verb would.

httr::RETRY("GET", "http://httpbin.org/status/200")

## Response [http://httpbin.org/status/200]
##   Date: 2020-03-28 12:29
##   Status: 200
##   Content-Type: text/html; charset=utf-8
## <EMPTY BODY>

httr::GET("http://httpbin.org/status/200")

## Response [http://httpbin.org/status/200]
##   Date: 2020-03-28 12:29
##   Status: 200
##   Content-Type: text/html; charset=utf-8
## <EMPTY BODY>

Now, what happens if the API keeps failing, which the example URL below ensures?

httr::RETRY(
  "GET", 
  "http://httpbin.org/status/500",
  times = 5, # the function has other params to tweak its behavior
  pause_min = 5,
  pause_base = 2)

## Request failed [500]. Retrying in 5 seconds...
## Request failed [500]. Retrying in 5 seconds...
## Request failed [500]. Retrying in 5 seconds...

## Request failed [500]. Retrying in 29.2 seconds...

## Response [http://httpbin.org/status/500]
##   Date: 2020-03-28 12:37
##   Status: 500
##   Content-Type: text/html; charset=utf-8
## <EMPTY BODY>

The function also makes use of the Retry-After HTTP header so, in short, if the API says “hey please wait 33 seconds” that’s what the waiting time will be.1

To learn more about httr::RETRY(), head over to its docs and source code.

A wild-caught example of a CRAN package using httr::RETRY() is the antanym package, whose RETRY() use can be traced back to a peer-review of the package by Lorenzo Busetto for rOpenSci.

Retry in crul

What is crul? crul is an R client organized around R6 classes.

The retry method for crul HttpClient class was modeled after httr’s RETRY(). I replaced my homegrown retrying code with it in a pull request.

crul’s retrying has two interesting differences with httr’s retrying:

To learn more about crul’s retry method, head over to its docs and source code.

On not reinventing the wheel

Once I heard about httr::RETRY() and the crul retry method, I was a bit disappointed at having reinvented the wheel. Could one avoid doing that too often?

< !--html_preserve-->

< !--/html_preserve-->

How to not reinvent the wheel in your code

As an R package developer, how do you know about functions and methods already existing in packages your package depends on, or could depend on, or could draw inspiration from? Sometimes you might guess your problem is something others encountered but you might not even know the right words to present it (mocking for instance!).

In a blog post Jeff Atwood states “If anything, “Don’t Reinvent The Wheel” should be used as a call to arms for deeply educating yourself about all the existing solutions”. General strategies for learning more and more about the R ecosystem include

Of course, “deeply educating yourself” takes time one doesn’t necessarily have and which no one should feel guilty about. Sometimes you’ll re-implement something that already exists elsewhere, and it’s fine!

Lastly, you might even want to create your own (better) version, which is obviously neat. ????

How to help users of your package not reinvent the wheel

As the developer of a package, you might help users find useful features by… working on its docs. A good time investment could be to create a pkgdown website with a well-organized reference index.

Furthermore, some features could be added to your package if they’re often implemented downstream.

Conclusion

In this post we’ve presented useful functions implementing retries for API packages in httr and crul.

< !--html_preserve-->

< !--/html_preserve-->

We’ve also discussed ways to not miss such useful shortcuts for one’s code, mostly by learning more about existing R packages, whilst acknowledging such exploration takes time. What’s your favorite lesser known package gem or R “joygret” moment2?


  1. If your only worry is rate limiting and there are no requests happening at the same time, you might find the ratelimitr package handy to avoid getting 429 status codes. [return]
  2. joygret was defined by Hilary Parker in a blog post about writing R packages as “that familiar feeling of the joy of optimization combined with the regret of past inefficiencies”. [return]

To leave a comment for the author, please follow the link and comment on their blog: Posts on R-hub blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.