Learning Analytic Administration through a Sandbox

Posted on August 22, 2018 by R Views in R bloggers | 0 Comments

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It all starts with sandboxes. Development sandboxes are dedicated safe spaces for experimentation and creativity. A sandbox is a place where you can go to test and break things, without the ramifications of breaking the real, important things. If you’re an analytic administrator who doesn’t have access or means to get a sandbox, I recommend that you consider advocating to change that. Here are just some of the arguments for why sandboxes are a powerful tool for the R admin that you may find helpful.

Sandbox experimentation develops valuable experience and promotes exposure to best practices.
Sandboxes can be used to demonstrate quick wins or establish grounds for future investments.
Sandboxes can increase engagement with the IT group through communicating from a more informed position.
They can be instrumental in creating installation and configuration recipes for the administration of R in production.

To be an effective R admin, I have to learn through doing. In my case, this often means standing up small server instances through Amazon Web Services so that I can test out different configurations or architectures. I like to follow a fairly regimented crawl-walk-run strategy for acquiring R administration knowledge, but things still slip through the cracks.

For example, I wish I had taken time to explore the very basic Run As :HOME_USER: configuration pattern when I was first learning the ropes of Shiny Server. This solves a very interesting problem: even with Shiny Server and RStudio Server installed on the same machine, Shiny applications developed in a user’s home directory within the RStudio IDE still need to be “deployed” to the Shiny Server directory in order to be made accessible there.

The Shiny Server documentation lays out a simple and elegant way to run applications as the user in whose home directory that app exists, thus circumventing the need to deploy from one location on the server to another. While this solution may not be desirable for many situations, it has great merits as a sandbox:

The single-server infrastructure can be installed and configured in minutes.
It can give you and your team a quick win if you’re looking to create a proof of concept.
You’ll gain exposure to the Shiny Server documentation and learn how to make edits to the default shiny-server configuration file.
You can create a recipe for installation and configuration that could potentially be reused by you or others, including IT.

In this post, I’ll go through the high-level steps it takes to implement this configuration as a sandbox server running on a single Amazon Web Services Elastic Cloud Compute (AWS EC2) instance. I’m going to assume you have very little experience with the technologies involved, but that you’re a tenacious R admin-in-training, hungry to learn and read whatever is necessary.

Note: sandboxes can be created on all sorts of different servers. I’ve chosen an AWS EC2 instance because it is an easily accessible and commonly used cloud platform, but you could create a sandbox on your local machine with a Virtual Machine, using something like VirtualBox; use another cloud provider; or find a different solution entirely. If you already have a fresh sandbox server to play with, skip the first section and proceed straight to Setting up the Sandbox.

Getting Started with Amazon Web Services and Elastic Cloud Compute

There are a few things you’ll need to do to get started with AWS EC2. First, you need an AWS account. That will require some initial setup and a credit card. Once you have all of that, you’ll have access to the Amazon Web Services console. This is the view of all the web services Amazon has to offer – it can be quite overwhelming to ponder. The service we’re interested in today is Elastic Cloud Compute (EC2). If you’re looking at the All Services view, it should be listed under Compute.

On the EC2 console page, you’ll need to do a couple of things:

Create a key pair and download it
Launch an Instance (click the blue button under “Create Instance” to go to the launch wizard)

Stepping through the launch wizard, you’ll have many options. Here were my selections:

Take special note of the security group formulation. I added two custom TCP rules for opening port 8787 (RStudio Server’s default port) and 3838 (Shiny Server’s default port).

At this point you’re ready to launch.

The Instances view under Resources on the main EC2 console page will show you a list of all the running EC2 instances you have in this region. Once the instance you launched is listed as running, you’ll want to connect to it. Click on your instance to select it in the list; the Connect button should become enabled once you do.

Click the Connect button and follow the steps listed there to SSH into your EC2 instance. Congrats – you now have a fresh CentOS-flavored Linux machine to learn on and configure!

Setting up the Sandbox: Installation

Now that you have a clean sandbox, it’s time to bring in the toys.

Install and enable the Extra Packages for Enterprise Linux (EPEL) repository
Follow the guidelines for installing R and the shiny package library listed in the instructions for Shiny Server open source
Continue using the same instructions to download and install Shiny Server

At this point, the Shiny Server service will start running with all default configurations in place. Go back to the EC2 console and your Connection dialog pane to grab the public DNS address. Navigate to that address in a web browser, using port 3838 (e.g. http://ec2-public-dns:3838)

You should see the welcome page!

The Shiny Server welcome page has two panels on the right-hand side. The top frame should feature a functional shiny application. The bottom frame is meant to show an R markdown document, but because you haven’t yet configured the server to host those documents, it should show an error message. If hosting R markdown documents is important to the success of your sandbox, learn how to set that up.

Now that Shiny Server is up and running, you’ll need to go through a similar installation process for RStudio Server. Remember that our plan from the beginning was to install both these services on the same machine – don’t create a new EC2 instance for RStudio Server.

Once installed, open a separate web browser window or tab and navigate to the public DNS again, but this time at port 8787 (e.g. http://ec2-public-dns:8787). Here you should see the RStudio Server sign-in landing page. To sign in to RStudio Server you’ll need a user and password. As the sandbox administrator, it’s your job to create this first user. Take a look at the RStudio Data Science Lab manual for instructions on how to do this. After you create a user, verify that you can sign into the RStudio Server IDE. This is where you’ll be able to build new Shiny applications.

Configuring the Sandbox

Shiny Server and RStudio Server should now be installed and running on your machine. The installation step is usually the easy part. Configuration tends to be harder. This is the stage where you’ll start adapting the default product so that it can perform in your particular environment. Configuration changes should be made based on your goals, architecture, and ultimately the type of experience you would like the end user to have with this software. In some cases, you may end up testing and combining configuration options from very different sources in the documentation. It can be easy to lose track of what changes were made, which is why keeping notes, or making step-by-step recipes is important.

Your goal is to change the default configuration of Shiny Server so that users (Shiny developers) in the RStudio IDE can save applications to a folder in their home directory, and have those applications be run as the home user and served from that home location.

To accomplish this change, find and edit the shiny-server.conf file.

There are two sections of the Shiny Server documentation that I found helpful in crafting my changes to the configuration file:

When you finish making changes to the configuration file, restart the shiny-server service.

Test your changes! The template new Shiny application should make it easy to test your deployment configuration. My user is named rstudio and this is what the tree structure of my home directory looks like for the deployment of a Shiny application, app1.

From the Shiny Server side, app1 is available at: http://ec2-public-dns:3838/rstudio/app1/

Write a Recipe and Retire the Sandbox

Remember to summarize your notes into scripts that you can reuse or just save as a reference. I like to keep my installation and configuration scripts in a version control system like git so that I have a lasting record of all the changes I make over time. Don’t worry if your script isn’t perfect right now. We will cover techniques for writing recipes to meet IT standards in a later post.

The final step of this process is to shut everything down. Once I declare success, make the notes I want to keep, and share any lessons learned, it’s time to terminate. If you invest in writing out a recipe script now, it shouldn’t take much time to recreate this sandbox. There’s no reason to spend money keeping it running for longer than you need it.

Use the Actions button in the Instances view of the EC2 console to terminate any running instances that you’re finished using.

Conclusion

As an analytic administrator, the job of legitimizing R and advocating for the best, cutting-edge software falls on you. This is challenging, potentially frustrating, but hopefully ultimately rewarding work. There are an infinite number of sandboxes to create and learn from; hopefully, this post will inspire you to pursue the creation and design of some of your own. Remember that sandboxes are a great tool for demonstrating the value of R as a proof-of-concept, or teaching yourself a new set of skills, but they generally aren’t meant to be taken into production.

For more information on running Shiny in production in an enterprise environment, I would recommend starting with an evaluation of RStudio Connect and RStudio Server Pro. There will also be a workshop at RStudio conf 2019 called Shiny in Production | Data Products at Scale taught by me and my colleague, Sean Lopp, which may be of interest to you. In the meantime, if you build a cool sandbox or learn something worth sharing, we hope you’ll post about it on the RStudio community forum for R admins.

Kelly O’Briant is a solutions engineer at RStudio interested in configuration and workflow management with a passion for R administration.

To leave a comment for the author, please follow the link and comment on their blog: R Views.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Learning Analytic Administration through a Sandbox