Site icon R-bloggers

Introduction to OpenCPU for R on EC2 with Python

[This article was first published on joy of data » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

OpenCPU is (simply put) a server implementing a RESTful web API for remotely executing R functions and retrieving results. In this tutorial I am going to showcase how OpenCPU can be installed on an EC2 instance running Ubuntu 14.04. Python and its requests package come into play for the purpose of conveniently handling HTTP communication. First and foremost thanks to the effort Jeroen Ooms put into developing OpenCPU and composing its documentation the whole process is comparatively easy and painfree.

In case you are merely interested in the API interactions, feel free to skip the first three sections. You can also install OpenCPU locally or simply use public.opencpu.org/ocpu. An IPython Notebook listing the successive API calls for public.opencpu.org/ocpu you may find here.

Setting Up OpenCPU in the Cloud

  1. Sign up for an AWS account
  2. Go to home page of your AWS console and choose EC2
  3. Click “Launch Instance” and choose “Ubuntu Server 14.04 LTS”
  4. Try t2.micro instance for free or go pro for about $0.13 per hour and choose c3.large (recommended)
  5. Keep the default settings; except for step 6 “Configure Security Group” where you add rules for “HTTP” and “HTTPS” both with “Source” set to “My IP”
  6. After a click on “Launch” your asked for whether to use an existing or a newly created key pair. Create a new one and store the pem-file f.x. in 
    ~/.ssh
  7. In the EC2 dashboard under “Instances” you may now monitor the state of your instance which should show “running” in a few seconds

Connect to your Instance

  1. Restrict permissions of key file: 
    chmod 400 ~/.ssh/amazon-aws.pem
  2. Connect via SSH: 
    ssh -i ~/.ssh/amazon-aws.pem ubuntu@[OCPU]

(replace [OCPU] with the domain or IP of your OpenCPU server)

Install R and OpenCPU

  1. sudo add-apt-repository ppa:opencpu/opencpu-1.4
  2. sudo apt-get update
  3. sudo apt-get install r-base r-base-dev
  4. sudo apt-get install opencpu
      (go with suggested defaults)
  5. http://[OCPU]/ocpu
      should now bring you to your OpenCPU API Explorer 🙂

Remotely Calling Procedures on HTTP

Open your favorite Python console and import json and requests. Let’s start with something very simple – calculating the mean of a vector using

base::mean()
 :

# (1)
> v = json.dumps([1,2,3,4,5])
> v 
'[1,2,3,4,5]'

# (2)
> r = requests.post("http://[OCPU]/ocpu/library/base/R/mean", data={"x":v})

# (3)
> print r.content
/ocpu/tmp/x021a101605/R/.val
/ocpu/tmp/x021a101605/stdout
/ocpu/tmp/x021a101605/source
/ocpu/tmp/x021a101605/console
/ocpu/tmp/x021a101605/info
/ocpu/tmp/x021a101605/files/DESCRIPTION

# (4)
> res = requests.get("http://[OCPU]/ocpu/tmp/x021a101605/R/.val")
> print res.content
3

# (5)
> r = requests.post("http://[OCPU]/ocpu/library/base/R/mean/json", data={"x":v})
> print r.content
[
  3
]

In short, an RPC here is a POST request to a URL with a path of the following structure:

/ocpu/library/[library name]/R/[function name]

And the function’s arguments are passed as the request’s payload (2). For that purpose we provide the list/array/vector as a seriaized JSON array (1). The response is going to be a number of session-relative paths which lead us to data regarding our RPC call (3). The first one (4) represents the result of the calculation – 3. By adding

/json
  to the initial request the original response already contains the result as a JSON (5). The expandable code box below features the content for all six paths.

> r = requests.post("http://[OCPU]/ocpu/library/base/R/mean", data={"x":v})

> for path in r.content.split():
>     print "n--------------------n" + path + " :nn" 
>         + requests.get("http://[OCPU]{}".format(path)).content

--------------------
/ocpu/tmp/x021a001605/R/.val :

[1] 3

--------------------
/ocpu/tmp/x021a001605/stdout :

[1] 3

--------------------
/ocpu/tmp/x021a001605/source :

mean(x = x)

--------------------
/ocpu/tmp/x021a001605/console :

> mean(x = x)
[1] 3

--------------------
/ocpu/tmp/x021a001605/info :

R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8    LC_NUMERIC=C            LC_TIME=en_US.UTF-8    
 [4] LC_COLLATE=en_US.UTF-8  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C          
 [7] LC_PAPER=C              LC_NAME=C               LC_ADDRESS=C           
[10] LC_TELEPHONE=C          LC_MEASUREMENT=C        LC_IDENTIFICATION=C    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] opencpu_1.4.6

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2    brew_1.0-6         devtools_1.6.1     evaluate_0.5.5    
 [5] formatR_1.0        grid_3.1.2         httpuv_99.999      httr_0.6.0        
 [9] jsonlite_0.9.14    knitr_1.8          lattice_0.20-29    openssl_0.2       
[13] parallel_3.1.2     RAppArmor_1.0.1.99 sendmailR_1.2-1    stringr_0.6.2     
[17] tools_3.1.2        unixtools_0.1-1   

--------------------
/ocpu/tmp/x021a001605/files/DESCRIPTION :

Package: x021a001605
Type: Session
Version: 1.0
Author: OpenCPU
Date: 2015-02-15
Description: This file is automatically generated by OpenCPU.

Chained RPC with Graphical Output

Now we are going to fit a simple linear model to the cars data set and plot it.

> r = requests.post("http://[OCPU]/ocpu/library/stats/R/lm", 
        data={"formula":"speed~dist","data":"cars"})

> print r.content
/ocpu/tmp/x0f51cfc661/R/.val
[...]


> res = requests.get("http://[OCPU]/ocpu/tmp/x0f51cfc661/R/.val")
> print res.content

Call:
lm(formula = speed ~ dist, data = cars)

Coefficients:
(Intercept)         dist  
     8.2839       0.1656

Well, that is all nice and dandy but of course this output is hardly programmatically efficient. It’s just an arbitrarily structured text. To receive something digestible we would have to provide and invoke a customized function which returns for example a well-structured serialized JSON which represents the object’s relevant features. Nonetheless there is a possibility to access fields of an object. In this case the object representing the linear model is an R list, so we can apply 

base::get()
  to it. The resulting object of a previous session is referenced by the ID of that session:

> res = requests.post("http://[OCPU]/ocpu/library/base/R/get/json", 
    data={"x":"'coefficients'","pos":"x0f51cfc661"})
> print res.content
[
    8.2839,
    0.1656
]

Important about the past call are two things:

  1. The argument value for
    "x"
      is passed as
    "'coefficients'"
     , so OpenCPU handles it as a string and not as the name of an object.
  2. The argument value for
    "pos"
      is passed as
    "x0f51cfc661"
     , so OpenCPU does handle it as a reference to an object – and this object happens to be the result of the session with that ID. Same logic applies to
    "speed~dist"
      and
    "cars"
      above, which do not represent strings but actual objects (a formula and a data frame).

For a scatter plot of the data with an overlaying regression line we would have to write a custom function again because this affords two successive function calls – first to

plot()
  and then to
abline()
  – and those we cannot chain. Of course this is no big deal and just matter of doing it. But let’s see what happens if we plot the linear model.

> req = requests.post("http://[OCPU]/ocpu/library/graphics/R/plot", 
    data={"x":"x0f51cfc661"})
> print req.content

/ocpu/tmp/x0dd5b7a086/R/.val
/ocpu/tmp/x0dd5b7a086/graphics/1
/ocpu/tmp/x0dd5b7a086/graphics/2
/ocpu/tmp/x0dd5b7a086/graphics/3
/ocpu/tmp/x0dd5b7a086/graphics/4
/ocpu/tmp/x0dd5b7a086/source
/ocpu/tmp/x0dd5b7a086/console
/ocpu/tmp/x0dd5b7a086/info
/ocpu/tmp/x0dd5b7a086/files/DESCRIPTION

You can now for example access the first graphic by:

http://[OCPU]/ocpu/tmp/x0dd5b7a086/graphics/1
 . This will respond with an image of type PNG. And by appending f.x. 
/png?width=300&height=300
  you can even specify a size.

Custom Functions

If you want to use your own functions, then you have to organize those in an R package and simply install them as usual in your root-R (which you may start with 

sudo -i R
). That’s because OpenCPU only finds packages installed on a global level.

(original article published on www.joyofdata.de)

To leave a comment for the author, please follow the link and comment on their blog: joy of data » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.