R + EC2 + RStudio Server
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve been battling memory limits in R for over two years. Although R has numerous resources for high-performance computing, I still couldn’t get around hardware limitations. Things really got out of control last summer when I started analyzing data on how climate change influences population synchrony across large spatiotemporal gradients. My datasets were simply too many and too large and no amount of code finessing, nor heavy use of Hadley’s approach helped much.
Initially I was turned off by the learning curve associated with the ins and outs of setting up R on EC2 but eventually I set up my own Ubuntu box with R, all of my packages and customizations, and saved that as a 64-bit AMI capable of running high memory quadruple extra large instances. This set up has worked really well for me over the last few months.
With the recent release of RStudio, and Rstudio server, I’ve been toying with the idea of running it on an EBS backed instance. Inspired by JD’s tweet, I got around to setting mine up this weekend. Here is a quick walk through.
Assuming you’ve launched and used EC2 services before, start out by launching a newer version of Ubuntu (I’m running 10.04 Lucid) and install the current release of R (2.13).
Next, install RStudio server by following the instructions here (be sure to follow 64-bit).
Once successfully installed, create a new user like so:
sudo adduser username
At this point, be sure to go change your EC2 security group to allow port 8787 on TCP.
If the instructions so far seem complicated or if you’d rather not start from scratch, you can follow instructions here to launch an existing AMI with Rstudio server compatible versions and take it from there.
Next, launch Rstudio from the server using your instance DNS like so:
http://ec2-75-102-193-170.compute-1.amazonaws.com:8787
(be sure to replace the DNS above with your current DNS from the EC2 Dashboard)
Next, login with the username and password set earlier and if everything worked, you should see something like this:
Next, install all the packages you would like. If you require Java backed packages such as glmulti, go ahead and set up Java from the terminal.
After that, you can easily (using GUI menus) save this customized instance by following instructions here. Voila. From now on, whenever you need to run a high-memory instance of R, just launch new instance, choose My AMIs, and once launched, connect to it via the browser using the current DNS. Brilliant!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.