Setting up RStudio Server quickly on Amazon EC2

John Mount

4 years ago

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have recently been working on projects using Amazon EC2 (elastic compute cloud), and RStudio Server. I thought I would share some of my working notes.

Amazon EC2 supplies near instant access to on-demand disposable computing in a variety of sizes (billed in hours). RStudio Server supplies an interactive user interface to your remote R environment that is nearly indistinguishable from a local RStudio console. The idea is: for a few dollars you can work interactively on R tasks requiring hundreds of GB of memory and tens of CPUs and GPUs.

If you are already an Amazon EC2 user with some Unix experience it is very easy to quickly stand up a powerful R environment, which is what I will demonstrate in this note.

To follow these notes you must already have an Amazon EC2 account, some experience using the AWS (Amazon Web Services) console, and managing ssh key pairs for use with EC2. Start by copying down the path where you have stored your secret half of your ssh key pair (your key should always be stored somewhere safe, such as a private and encrypted disk volume; if you don’t have a key pair you will be prompted create one during machine launch).

You can set up an RStudio Server instance as follows.

Choose Ubuntu Server as your AMI (Amazon Machine Image) type. This step is choosing your operating system, the script we will use was developed for apt package management– so we suggest using the Ubuntu operating system.

Then choose your hardware or machine type. I will show a t2.micro instance (1 virtual CPU, 1 GB memory), but there are a lot of bigger machine types available (including up to 96 VCPUs, hundreds of GB memory, and GPU compute instances specialized for deep learning tasks).

Now copy the IPv4 DNS name Amazon assigns to your instance (as shown below).

In our case we have:

Path to key: /Volumes/Private/Accounts/wvdbkp.pem.txt
IPv4 DNS name: ec2-35-166-235-208.us-west-2.compute.amazonaws.com

Now download and run the following bash script on your local (or client) machine (this is assuming you have a bash shell and Unix components, which are available on OSX, Linux, BSD, and even Windows; we have only tested this from an OSX client).

In our case we run the script in a bash shell with the arguments as follows:

   bash confEc2RServer.bash \
          /Volumes/Private/Accounts/wvdbkp.pem.txt \
          ec2-35-166-235-208.us-west-2.compute.amazonaws.com

You would run the script with your own ssh key path and your server IPv4 DNS name.

The script combines some ideas from Deep Learning with R, François Chollet with J. J. Allaire, Manning 2018 and Jeremy Howard’s Practical Deep Learning For Coders with our own experiences working with EC2. The script will produce output for about 3 minutes, when it stops producing output (but is still running) it has switched from installation to running a ssh tunnel to direct web-requests targeting your local machine to appear as local requests on the remote server.

At this point you direct your web browser to http://127.0.0.1:8787 and login with the user name “ruser” and password “ruser“.

At this point you should see a RStudio Server Console and you should be ready to work.

The instance type we have configured includes a local PostgreSQL database (user name “ruser” and password “ruser“). Both the Web-Server and database default to only accepting connections that are considered “local” by the remote server. We are able to access the RStudio Server Console through our ssh tunnel, and the database is only available to processes local to the server. If you end the script you are running on your client machine, you close the ssh tunnel (and lose access to the remote server). We haven’t configured any GPU features such as CUDA, tensorflow, or Keras (however that is only one or two lines more and available from the appendix of Deep Learning with R).

And that is it. Don’t forget to copy results off the server when you are done and to dispose of the server by moving its instance state to “Terminate” in the AWS console (this frees the virtual machine, and usually destroys all storage associated with the machine; this is critical to do so that you don’t experience continuing fees for a machine you are done with).

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.