Are you leaking h2o? Call plumber!

Longhow Lam

3 years ago

[This article was first published on R – Longhow Lam's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Create a predictive model with the h2o package.

H2o is a fantastic open source machine learning platform with many different algorithms. There is Graphical user interface, a Python interface and an R interface. Suppose you want to create a predictive model, and you are lazy then just run automl.

Lets say, we have both train and test data sets, and the first column is the target and the columns 2 until 50 are input features. Then we can use the following code in R

out = h2o.automl(
   x = 2:50, 
   y = 1,
   training_frame = TrainData, 
   validation_frame = TestData, 
   max_runtime_secs = 1800
)

According the help documentation: The current version of automl trains and cross-validates a Random Forest, an Extremely-Randomized Forest, a random grid of Gradient Boosting Machines (GBMs), a random grid of Deep Neural Nets, and then trains a Stacked Ensemble using all of the models.

After a time period that you can set, automl will terminate and has literally tried hundreds of models. You can get the top models, ranked by a certain performance metric. If you go for the champion just use:

championModel = out@leaderchampionModel

Now the championModel object can be used to score / predict new data with the simple call:

predict(championModel, newdata)

The champion model now lives on my laptop, it’s hard if not impossible to use this model in production. Someone or some process that wants a model score should not depend on you and your laptop!

Instead, the model should be available on a server where model prediction requests can be handled 24/7. This is where plumber comes in handy. First save the champion model to disk so that we can use it later.

h2o.saveModel(championModel, path = "myChampionmodel")

The plumber package, bring your model in production.

In a very simple and concise way, the plumber package allows users to expose existing R code as an API service available to others on the web or intranet. Now suppose you have a saved h2o model with three input features, how can we create an API from it? You decorate your R scoring code with special comments, as shown in the example code below.

# This is a Plumber API. In RStudio 1.2 or newer you can run the API by
# clicking the 'Run API' button above.

library(plumber)library(h2o)
h2o.init()
mymodel = h2o.loadModel("mySavedModel")

#* @apiTitle My model API engine

#### expose my model #################################
#* Return the prediction given three input features
#* @param input1 description for inp1
#* @param input2 description for inp2
#* @param input3 description for inp3
#* @post /mypredictivemodel
function( input1, input2, input3){
   scoredata = as.h2o(
      data.frame(input1, input2, input3 )
   )
   as.data.frame(predict(mymodel, scoredata))
}

To create and start the API service, you have to put the above code in a file and call the following code in R.

rapi <- plumber::plumb("plumber.R")  # Where 'plumber.R' is the location of the code file shown above 
rapi$run(port=8000)

The output you see looks like:

Starting server to listen on port 8000 
Running the swagger UI at http://127.0.0.1:8000/__swagger__/

And if you go to the swagger UI you can test the API in a web interface where you can enter values for the three input parameters.

What about Performance?

At first glance I thought there might be quit some overhead, calling the h2o library, loading the h2o predictive model and then using the h2o predict function to calculate a prediction given the input features.

But I think it is workable. I installed R, h2o and plumber on a small n1-standard-2 Linux server on the Google Cloud Platform. One API call via plumber to calculate a prediction with a h2o random forest model with 100 trees took around 0.3 seconds to finish.

There is much more that plumber has to offer, see the full documentation.

Cheers, Longhow.

To leave a comment for the author, please follow the link and comment on their blog: R – Longhow Lam's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.