Site icon R-bloggers

Hosting RStudio Server on Azure

[This article was first published on R on The Jumping Rivers Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Can’t be bothered reading, tell me now

Host RStudio server on an azure instance. Configure the instance to access RStudio with a nice url

Getting started

Azure is cloud computing framework provided by Microsoft, the same idea as AWS by Amazon. In this post, we’ll describe how to use Azure to run RStudio Server in the cloud.

Unfortunately, things don’t start well – Microsoft have made an endurance test of getting started with Azure. The first stop is the Azure web-page. On this page

click on Free Account and follow the instructions. This is a bit of painful process that will require

  • Email confirmation
  • Text confirmation
  • Credit Card confirmation

Eventually you should get to the dashboard page!

Clicking on Create Resources will take you to the marketplace

Selecting Ubuntu Server will launch a dialogue box with four steps:

  • Step 1: Basics: configuration settings
    • Name: A name for the virtual machine, e.g. rstudio
    • User name: The master user who will have sudo access, e.g. userX
    • Authentication type: Either choose ssh or enter a password
    • Resource group: Since this your first instance, create a new one, say rstudio-group
    • Location: where will your machine be located
  • Step 2: Virtual machine size
    • Select the machine you want. Choose the smallest for the purposes of this exercise
  • Step 3: Settings
    • Nothing to change here
  • Step 4: Summary
    • Click create and we’re good to go!

After around a minute or so, your virtual machine will be ready.

Setting up R

The next step is to ssh into your instance. On the dashboard screen, click on the new box that shows your virtual machine. Select Networking. Near the top of the screen will be a Public IP address, of the form: XXX.XXX.XXX.XXX. In my instance, the IP address is 52.233.194.195

Make a note of your address. Next ssh into your instance via

ssh userX@XXX.XXX.XXX.XXX

To ensure that ubuntu is up-to-date on our virtual machine, we invoke super sudo powers. First we update the list of ubuntu packages

sudo apt-get update

Then we upgrade as necessary

sudo apt-get upgrade

Now we get on with the business of installing R. To use the latest version we need to add a new repository

sudo add-apt-repository ppa:marutter/rrutter

Then update again and install base R

sudo apt update
sudo apt-get install r-base

Depending on what R packages you want to install it’s worth installing a couple of other things at this point

sudo apt-get install libxml2 libxml2-dev # igraph
sudo apt-get install libcairo2-dev # Graphics packages
sudo apt-get install libssl-dev libcurl4-openssl-dev #httr

With an eye to the future it’s also worth installing apache2 to help with redirects

sudo apt-get install apache2

Opening ports ready for RStudio

Whenever you access a web-page, the browser specifies a port. For standard http pages, we use port 80, for secure https pages, we use port 443. For example, when we type

https://www.jumpingrivers.com

in the browser, this is converted to

https://www.jumpingrivers.com:443

By default our azure instance only has port 22 open (the port used for ssh communication). To access RStudio, we’ll need to open the following ports

  • 80 (for http)
  • 443 (for https); only required if we implement SSL
  • 8787 – the default RStudio port. In the last section, we’ll remove this, but just now it’s handy to have it open for testing.

Under Networking, click Add inbound port rule and add the three ports (80, 443, 8787):

If everything is working, you should be able to enter XXX.XXX.XXX.XXX in your browser and you’ll see the Apache2 Ubuntu Default Page with the title. It works!

Installing RStudio

Installing RStudio server is now relatively easy:

# Check the above link for updates to the version
sudo apt-get install gdebi-core
wget https://download2.rstudio.org/rstudio-server-1.1.383-amd64.deb
sudo gdebi rstudio-server-1.1.383-amd64.deb

If everything works correctly, you should be able to view rstudio server via

XXX.XXX.XXX.XXX:8787

If the page hangs, double check you have opened port 8787 under the network settings.

Nicer URLs

The first step is to access the page via a standard URL and not an IP address. In the main dashboard screen, under all resources, click on

rstudio-ip Public IP address

Then select configuration. In the text box under DNS Label, enter text, e.g. rstudio-myname. So in my case, I have used rstudio-jumpingrivers

This means we can now access RStudio via

rstudio-jumpingrivers.westeurope.cloudapp.azure.com:8787

Getting users to type the port number isn’t ideal. What we would like is for users to type

rstudio-jumpingrivers.westeurope.cloudapp.azure.com/rstudio

This involves configuring Apache. First navigate to /etc/apache2/sites-available, e.g.

cd /etc/apache2/sites-available

Next create a file called rstudio.conf. Using your favourite text editor, e.g. vim or nano. Note that this file is very much space sensitive, so check it carefully.

<VirtualHost *:80>
  ServerAdmin info@jumpingrivers.com
  ServerName rstudio-jumpingrivers.westeurope.cloudapp.azure.com
  ServerAlias www.rstudio-jumpingrivers.westeurope.cloudapp.azure.com
  <Proxy *>
    Allow from localhost
  </Proxy>

  # Specify path for Logs
  ErrorLog ${APACHE_LOG_DIR}/error.log
  CustomLog ${APACHE_LOG_DIR}/access.log combined
  RewriteEngine on

  # Following lines should open rstudio directly from the url
  # Map rstudio to rstudio/
  RedirectMatch ^/rstudio$ /rstudio/

  RewriteCond %{HTTP:Upgrade} =websocket
  RewriteRule /rstudio/(.*) ws://localhost:8787/$1 [P,L]
  RewriteCond %{HTTP:Upgrade}     !=websocket
  RewriteRule /rstudio/(.*) http://localhost:8787/$1 [P,L]
  ProxyPass /rstudio/     http://localhost:8787/
  ProxyPassReverse /rstudio/ http://localhost:8787/
  ProxyRequests off
</VirtualHost>

Then enable the necessary Apache modules

sudo a2enmod proxy
sudo a2enmod proxy_http
sudo a2enmod proxy_html
sudo a2enmod proxy_wstunnel
sudo a2enmod rewrite

Finally, restart Apache

sudo a2ensite rstudio.conf
sudo service apache2 restart  

You should now be able to access RStudio via

rstudio-jumpingrivers.westeurope.cloudapp.azure.com/rstudio/

Adding SSL

In theory it should be straightforward to add SSL support using Let’s Encrypt. However, I’ve found that you hit rate limiters since the domain is azure.com. However, if we register our own domain, we can easily add SSL support. This will be the subject of our next blog post.

References


To leave a comment for the author, please follow the link and comment on their blog: R on The Jumping Rivers Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.