Scaling RServe Deployments
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Tableau runs R scripts using RServe, a free, open-source R package. But if you have a large number of users on Tableau Server and use R scripts heavily, pointing Tableau to a single RServe instance may not be sufficient.
Luckily you can use a load-balancer to distribute the load across multiple RServe instances without having to invest in a commercial R distribution. In this blog post, I will show you, how you can achieve this using another open source project called HAProxy.
Let’s start by installing HAProxy.
On Mac you can do this by running the following commands in the terminal
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Followed by
brew install haproxy
Create the config file that contains pointers to the Rserve instances.
In this case I created in the folder ‘/usr/local/Cellar/haproxy/’ but it could have been some other folder.
global daemon maxconn 256 defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms listen stats bind :8080 mode http stats enable stats realm Haproxy\ Statistics stats uri /haproxy_stats stats hide-version stats auth admin:admin@rserve frontend rserve_frontend bind *:80 mode tcp timeout client 1m default_backend rserve_backend backend rserve_backend mode tcp option log-health-checks option redispatch balance roundrobin timeout connect 10s timeout server 1m server rserve1 localhost:6311 check maxconn 32 server rserve2 anotherserver.abc.lan:6311 check maxconn 32
The highlights in the config file are the timeouts, max connections allowed for each Rserve instance, host:port for Rserve instances, load balancer listening on port 80, balancing being done using roundrobin method, server stats page configured on port 8080 and username and password for accessing the stats page. I used a very basic configuration but HAProxy documentation has detailed info on all the options.
Let’s check if config file is valid and we don’t have any typos etc.
BBERAN-MAC:~ bberan$ haproxy -f /usr/local/Cellar/haproxy/haproxy.cfg -c Configuration file is valid
Now you can start HAproxy by passing a pointer to the config file as shown below:
sudo haproxy -f /usr/local/Cellar/haproxy/haproxy.cfg
Let’s launch Tableau and enter the host and port number for the load balancer instead of an actual RServe instance.
Success!! I can see the results from R’s forecasting package in Tableau through the load balancer we just configured.
Let’s run the calculation one more time.
Now let’s look at the stats page for our HAProxy instance. In this case per our configuration file by navigating to http://localhost:8080/haproxy_stats.
I can see the two requests I made and that they ended up being evaluated on different RServe instances as expected since round-robin load balancing forwards a client request to each server in turn.
Now let’s install it on a server that is more likely to be used in production and have it start up automatically etc.
I used a Linux machine (Ubuntu 14.04 specifically) for this. There are only a few small differences in the configuration steps. To install HAProxy, in a terminal window enter :
apt-get install haproxy
Now edit the haproxy file under the directory /etc/default/ and set ENABLED=1. This is by default 0. Setting to 1 will run HAProxy automatically when the machine starts.
Now let’s edit the config file which can be found here /etc/haproxy/haproxy.cfg to match the config example above.
And we’re ready to start the load balancer:
sudo service haproxy start
Now you can serve many more visualizations containing R scripts to a larger number of Tableau users. Depending on the amount of load you’re dealing with, you can start with running multiple RServe processes on different ports of the same machine or you can add more machines to scale out further.
Time to put advanced analytics dashboards on more screens
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.