Lessons Learned From Running R in Production

[This article was first published on Matt Kaye, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

A couple weeks ago, I wrote a high-level post on REST APIs. One thing that I noted was that I couldn’t, in good faith, recommend running R (or Plumber, a common library used to create APIs in R) in any type of high-load production system.

Admittedly, this was a pretty inflammatory thing to say. I know there’s a whole community of R developers working on R in production, as well as lots of believers in R for production services. I know this, in part, because I’ve been one of them in the past. But more on that in a bit.

In that aside in my last post, I commented that the reasons why I won’t be running R in production anymore were out of scope. This post is intended to explain those reasons in much more technical detail, to the extent that I’m capable.

R Evangelism

First thing’s first: I love R. I’ve been a bit of an R evangelist for the past five years or so, and I think that R provides fantastic tooling that helps me do my day to day work dramatically (think maybe 3-4x) faster than equivalent tooling in Python, SQL, etc. I think I could argue strongly that the Tidyverse suite of tools has had a larger impact on how I write analytical code and how I think about data wrangling problems – in addition to just how I program in general – than any other single technical thing I’ve ever come across. In particular, purrr introduced me to functional programming and using functional patterns in the analytical code I write, and I haven’t looked back since.

I say this because I don’t want the rest of this post to seem as if it’s coming from someone parroting the same Python lines about how “it’s a general purpose programming language” and how “R is made for statisticians so it’s not meant for production” or any of the other usual arguments against R. My view is that most of these arguments are just people being dogmatic, and that most of those common criticisms of R are being leveled by people who have never actually worked in R.

I’ve argued with my fair share of people on the internet about R in production, and am aware of the usual pro-R arguments. I know about the Put R In Prod talks, and have used Plumber and RestRserve. I’m familiar with vetiver and the suite of MLOps tools that the Tidymodels team has been working on building out. In the past, I’ve referenced things like Put R in Prod as evidence that you can, in fact, run R in production. But I always felt a bit queasy about it: How was it, I’d ask myself, that I could really only find one reference of a company genuinely running R in production, when virtually every place that does machine learning that I’m aware of has experience with Python, Rust, Scala, or similar? This post is a long-form answer to that question.

Production Services

Before I get into the guts of this post, I want to re-emphasize part of the tagline. When I say “production” in this post, I mean high-load production systems. I’m not talking about Shiny apps. I’m not talking about APIs getting one request every few seconds. I’m not talking about “offline” services where response times don’t particularly matter. I’ve had lots of success using R in all of those settings, and I think R is a great tool for solving problems in those spaces.

This post is about high-load, online systems. You might think of this, roughly, as a system that’s getting, say, more than one request per second on average, at least five requests per second at peak times, and there’s a requirement that the service responds in something like 500 milliseconds (p95) with a p50 of maybe 100. For the rest of this post, that is the kind of system I’m describing when I say “production.”

Problems

We’ve run into a number of problems with R in production. In broad strokes, the issues we’ve had have come from both Plumber, the API library we were using, and R itself. The next few sections cover some of the issues that caused the most headaches for us, and ultimately led us to switch over to FastAPI.

Gunicorn, Web Servers, and Concurrency

First and foremost: R is single-threaded. This is one of the most common criticisms I hear about R running in production settings, especially in the Python vs. R for production discussions I’ve been in. Of course, those discussions tend to ignore that Python also runs single-threaded, but I digress.

This post will be a bit more technical than some of my others. Since it’s already going to be long, I won’t be doing as much explaining the meanings of terms like “single-threaded” or similar.

R running single-threaded and not managing concurrency particularly well isn’t a problem in and of itself. Other languages (Python, Ruby, etc.) that are very often used in production systems of all sizes have the same issue. The problem with R in particular is that unlike Python, which has Gunicorn, Uvicorn, and other web server implementations, and Ruby, which has Puma and similar, R has no widely-used web server to help it run concurrently. In practice, this means that if you, for instance, were to run a FastAPI service in production, you’d generally have a “leader” that delegates “work” (processing requests) to workers. Gunicorn or Uvicorn would handle this for you. This would mean that your service can handle as many concurrent requests as you have workers without being blocked.

As I mentioned, R has no equivalent web server implementation, which, in combination with running single-threaded, means that a Plumber service really can only handle one request at a time before getting blocked. In my view, this makes running high-load production services in R a non-starter, as concurrency and throughput are the ultimate source of lots of scalability problems in APIs. Yes, Plumber does indeed integrate with future and promises to allow for some async behavior, but my view is that it’s hard to make an argument that async Plumber is a viable substitute for a genuinely concurrent web server.

But let’s put aside the “non-starter” bit for a second, and let’s imagine that you, like me, want to try everything in your power to get R working in production. The following sections will cover other issues we’ve run into, and a number of workarounds we attempted, to varying degrees of success.

Types and Type Conversion

In my opinion, one of the biggest issues with R is the type system. R is dynamically typed, and primitive types are generally represented as length-one vectors. That’s why these two variables are of the same type:

class(1)
[1] "numeric"
class(c(1, 2))
[1] "numeric"

This is a big problem. What happens when we try to serialize the number 1 to JSON?

jsonlite::toJSON(1)
[1] 

It returns [1] – as in: A length-one list, where the one element is the number one. Of course, you can set auto_unbox = TRUE, but that has other issues:

jsonlite::toJSON(1, auto_unbox = TRUE)
1 

This is fine, but the problem with auto_unbox = TRUE is that if you have a return type that is genuinely a list, it could sometimes return a list, and sometimes return a single number, depending on the length of the thing being returned:

get_my_fake_endpoint <- function(x) {
  jsonlite::toJSON(x + 1, auto_unbox = TRUE)
}

get_my_fake_endpoint(1)
2 
get_my_fake_endpoint(c(1, 2))
[2,3] 

In these two examples, I’ve gotten two different response types depending on the length of the input: One was a list, the other was an integer. This means that, without explicit handling of this edge case, your client has no guarantee of the type of the response it’s going to get from the server, which will inevitably be a source of errors on the client side.

In every other programming language that I’m aware of being used in production environments, this is not the case. For instance:

import json
import sys

x = 1
y = [1, 2]

print(type(x))
<class 'int'>
print(type(y))
<class 'list'>
json.dump(x, sys.stdout)
1
json.dump(y, sys.stdout)
[1, 2]

In Python, the number 1 is an integer type. The list [1, 2] is a list type. And the JSON library reflects that. No need for unboxing.

But there’s more! R (and Plumber) also do not enforce types of parameters to your API, as opposed to FastAPI, for instance, which does via the use of pydantic. That means that if you have a Plumber route that takes an integer parameter n and someone calls your route with ?n=foobar, you won’t know about that until the rest of your code runs, at which point you might get an error about n being non-numeric.

Here’s an example:

library(plumber)

pr() %>%
  pr_get(
    "/types",
    function(n) {
      n * 2
    }
  ) %>%
  pr_run()

Obviously, n is indented to be a number. You can even define it as such in an annotation like this:

#* @param n:int 

But R won’t enforce that type declaration at runtime, which means you need to explicitly handle all of the possible cases where someone provides a value for n that is not of type int. For instance, if you call that service and provide n=foobar, you’d see the following in your logs (and the client would get back an unhelpful HTTP 500 error):

<simpleError in n * 2: non-numeric argument to binary operator>

If you do the equivalent in FastAPI, you’d have vastly different results:

from fastapi import FastAPI

app = FastAPI()

@app.get("/types")
async def types(n: int) -> int:
  return n * 2

Running that API and making the following call returns a very nice error:

curl "http://127.0.0.1:8000/types?n=foobar" | jq

{
  "detail": [
    {
      "loc": [
        "query",
        "n"
      ],
      "msg": "value is not a valid integer",
      "type": "type_error.integer"
    }
  ]
}

I didn’t need to do any type checking. All I did was supply a type annotation, just like I could in Plumber, and FastAPI, via pydantic, did all the lifting for me. I provided foobar, which is not a valid integer, and I get a helpful error back saying that the value I provided for n is not a valid integer. FastAPI also returns an HTTP 422 error (the error code is configurable), which tells the client that they did something wrong, as opposed to the 500 that Plumber returns, indicating that something went wrong on the server side.

Clients and Testing

Another issue with Plumber is that it doesn’t integrate nicely with any testing framework, at least that I’m aware of. In FastAPI, and every other web framework that I’m familiar with, there’s a built-in notion of a test client, which lets you “call” your endpoints as if you were an external client. In Plumber, we’ve needed to hack similar behavior together using testthat by spinning up the API in a background process, and then running a test suite against the local instance of the API we spun up, and then spinning down. This has worked fine, but it’s clunky and much harder to maintain than a genuine, out-of-the-box way to do testing that really should ship with the web framework. I’ve heard of callthat, but I’ve never actually tried it for solving this problem.

Performance

When I’ve defended R in that past, I’ve also heard a common complaint about it’s speed. There are very often arguments that R is slow, full-stop. And that’s not true, or at least mostly not true. Especially relative to Python, you can write basically equally performant code in R as you can in numpy or similar. But some things in R are slow. For instance, let’s serialize some JSON:

library(jsonlite)

iris <- read.csv("fastapi-example/iris.csv")

result <- microbenchmark::microbenchmark(
  tojson = {toJSON(iris)},
  unit = "ms", 
  times = 1000
)

paste("Mean runtime:", round(summary(result)$mean, 4), "milliseconds")
[1] "Mean runtime: 0.7482 milliseconds"

Now, let’s try the same in Python:

from timeit import timeit
import pandas as pd

iris = pd.read_csv("fastapi-example/iris.csv")

N = 1000

print(
  "Mean runtime:", 
  round(1000 * timeit('iris.to_json(orient = "records")', globals = locals(), number = N) / N, 4), 
  "milliseconds"
)
Mean runtime: 0.1166 milliseconds

In this particular case, Python’s JSON serialization runs 6-7x faster than R’s. And if you’re thinking “that’s only one millisecond, though!” you’d be right. But the general principle is important even if the magnitude of the issue in this particular case is not.

JSON serialization is the kind of thing that you’re going to need to do if you’re building an API, and you generally want it to be as fast as possible to limit overhead. It also takes longer and longer as the JSON itself is more complicated. So while in this particular case we’re talking about microseconds of difference, the underlying issue is clear: Plumber uses jsonlite to serialize JSON under the hood, and jsonlite is nowhere near as fast as roughly equivalent Python JSON serialization for identical data and the same result.

The takeaway here is that while it may be true that vectorized R code to create a feature for a model or low-level BLAS or LAPACK code that R calls to perform matrix multiplication should be equally performant to the equivalent Python, R can sometimes have overhead, like in JSON serialization, that becomes apparent as the size and complexity of both the body of the request as well as the response body scale up. There are certainly other examples of the same overhead. When we moved a Plumber service to FastAPI with no changes to the logic itself, we got about a 5x speedup in how long it took to process requests. And just to reiterate: That 5x speedup had nothing to do with changes to logic, models, or anything tangible about the code. All of that was exactly the same.

Integration with Tooling

Another issue with R for production is that very often in production services, we want to do some things on the following (in no particular order and not exhaustive) list:

  1. Serve predictions from one version of a model to some users and another version of a model to other users.
  2. Have reports and alerts of when errors happen in our API.
  3. Use a database - such as a feature store - for retrieving features to run through our models.

There are tools for doing all of these things. The first would be a feature flagging tool like LaunchDarkly. The second would be an error monitoring tool like Sentry. And the last would be a feature store like Feast which might use something like Redis under the hood.

Python supports all of these tools. All of them have APIs in Python, and are easily integrated into a FastAPI, Flask, or Django service. R has no bindings for any of them, meaning that if you wanted to run R in a system where you use feature flags, for instance, you’d need to roll your own flagging logic or find a more obscure tool that supports R. That’s fine for some teams, but writing feature flagging logic isn’t generally a good use of a data scientist’s time. And especially not when there are a whole variety of great tools that you can grab off the shelf and slot into a Python service seamlessly.

This issue expands beyond just production tooling, too. For instance, there are a number of MLOps tools for all parts of the machine learning process, such as Weights & Biases for experiment tracking and model versioning, Evidently for monitoring, Bento for model serving, and so on, that all only have Python bindings. That’s not to say that there are no tools that support R – some, like MLFlow certainly do – but the set of tools that support R is a strict, and small, subset of the ones that support Python. I’m also aware of the great work that the Tidymodels team is doing on Vetiver, Pins, and related packages in the Tidymodels MLOps suite, but the reality is that these tools are far behind the state of the art (but are catching up!).

Workarounds

Our team tried out a bunch of ideas to get around these issues before ultimately abandoning R in favor of FastAPI.

We “solved” the types issues R has by having lots of type validation at request time, and making use of JSON schema validation to the extent that we could to limit the number of edge cases we ran into. We use MLFlow for model tracking, and don’t really need some of the more “sophisticated” MLOps tools mentioned before. But the first issue – and the biggest – was the concurrency issue, which we ultimately failed to overcome.

Load Balancing on PaaS Tools

The first “fix” for R’s concurrency issue we tried was the most expensive one: Buying our way out of the problem. We horizontally scaled our R service up from a a single instance to having multiple instances of the service behind a load balancer, which bought us a significant amount of runway. It’s expensive, but this could be a reasonably good solution to R’s concurrency issues for most teams. However, there’s only so far that you can scale horizontally before needing to address the underlying issues. For instance, if you have a model that takes 250ms to run predictions through and return to the client, on average you can process four requests per second per instance of your API. But since R is running single-threaded, you probably can really only run about one or two requests per second before you start to get concerned about requests backing up. If one request takes one second, now you could have four more requests in the queue waiting to be processed, and so on.

Horizontal scaling fixes this issue to some extent, but the magnitude of the problem scales linearly as the amount of throughput to the service increases. So ultimately, it’s inevitable that you’ll need to either address the underlying problem of the performance of the service, or spend potentially exorbitant amounts of money to buy your way out of the problem.

NGINX As A Substitute

We also tried to get around R’s lack of an ASGI server like Uvicorn by sitting our Plumber API behind an NGINX load balancer. The technical nuts and bolts were a little involved, so I’ll just summarize the highlights here. We used a very simple NGINX conf template:

## nginx.conf

events {}

http {
  upstream api {

      ## IMPORTANT: Keep ports + workers here
      ## in sync with ports + workers declared
      ## in scripts/run.sh

      server localhost:8000;
      server localhost:8001;
      server localhost:8002;
      server localhost:8003;
  }

  server {
    listen $PORT;
    server_name localhost;
    location / {
      proxy_pass http://api;
    }
  }
}

Then, we’d boot the API as follows:

## run.sh

# !/bin/bash

## IMPORTANT: Keep ports + workers here
## in sync with ports + workers declared
## in nginx.conf

Rscript -e "plumber::options_plumber(port = 8000); source('app.R')"&
Rscript -e "plumber::options_plumber(port = 8001); source('app.R')"&
Rscript -e "plumber::options_plumber(port = 8002); source('app.R')"&
Rscript -e "plumber::options_plumber(port = 8003); source('app.R')"&

sed -i -e 's/$PORT/'"$PORT"'/g' /etc/nginx/nginx.conf && nginx -g 'daemon off;'

The basic premise was that we’d boot four instances of our Plumber API as background processes, and then let NGINX load balance between them.

This worked well in theory (and also in practice, to an extent) until we ran into an odd problem: At certain levels of load, the “workers” started to die and not reboot, which resulted in cascading failures. Essentially, one worker would go down, resulting in more load on the other workers, until a second worker went down, causing the service to spiral and eventually crash. You can see this happening in the load test below.

At relatively low levels of traffic (~25 RPS) the service starts to have issues. Those issues snowball, until eventually every request begins failing. Note that 25 requests per second sounds like a lot, but that load is distributed across four workers, meaning each worker is attempting to handle 5-6 requests per second.

For some services, burning down under this amount of load is fine. But it was disconcerting for us, especially since the amount of actual work that the endpoint we were hitting in our load test was doing was just extracting a pre-computed item from a list. The vast majority of the overhead in these API calls was JSON serialization, type checking, and network latency, but the R service itself was only taking about 10ms to process the request (read: extract the necessary element) once the endpoint got the request.

The problem for us was that if the service is burning down in this case where we’re doing virtually no lifting at all after getting 25 or so requests per second, what happens when the processing time for a request jumps up from 10ms to 250ms for creating features, making predictions, and so on? In that world, I’d expect that even behind NGINX, our service could probably only safely process about 5 requests per second before it starts getting dicey and us needing to start thinking again about horizontally scaling to more instances, and that wasn’t nearly enough headroom to be comfortable with in a production system.

Wrapping Up

I want to wrap up by tying everything here back to what I discussed at the start. I’m very much not an R hater: Quite the opposite. I think R is an amazing language that’s made it so much easier and more enjoyable to do my job day-to-day, and I hoped against hope that we could figure out a way to run R in production. I thought most of the normal complaints I’d heard about R “just not being production-grade” weren’t rigorous enough, and that made me want to give it a shot to prove the haters wrong, in some sense.

It turned out, unfortunately, that I was the one that was wrong. As we scaled up our R in production rig, it became increasingly apparent that there were some very real problems with R that we really couldn’t get around without either putting lots of work and duct tape into fixing them, or throwing money at them. And Python – FastAPI in particular – was so much simpler, more reliable, and gave me dramatically more confidence than I had in Plumber.

Once upon a time, I’d hoped that this post would’ve been something of a love letter to R: A story of triumph, where we figured out how to tweak and tune and architect just right, and got a stable, fast R service running in production to prove all of the doubters and dogmatic Pythoners wrong. But unfortunately, it didn’t turn out that way. So my hope in writing this post was to expose some more of the nuts and bolts for why I won’t be trying to run R in production again, or at least not in the short term. I don’t believe in sweeping statements like “R isn’t built for production” since I don’t actually know what they mean, and they’re not helpful for getting to the root of the problem with R. But as I discovered, “R doesn’t have an equivalent of Gunicorn or Puma” is a very legitimate flaw with R that makes it very difficult – not impossible, but very difficult – to run R in production.

But there’s a larger point here, that maybe is an undertone of this whole post, and to some degree all of the conversation about R vs. Python for production. The reality is that lots of software teams are running Python in production, which means that Python and its community are focused on building tooling in Python to service that use case. To my knowledge, there aren’t many teams running R in production, and so the focus and will doesn’t seem to be there in the same way it is for Python. And maybe that’s what matters most at the end of the day: Python is a safer choice since more people are using it, which means it continues to be worked on more, which makes it faster, and safer, and so on. And the cycle continues.

I hope that R can have that same focus on production, some day.

To leave a comment for the author, please follow the link and comment on their blog: Matt Kaye.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)