Site icon R-bloggers

Where Does RStudio Fit into Your Cloud Strategy?

[This article was first published on RStudio | Open source & professional software for data science teams on RStudio, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Photo by Mantas Hesthaven on Unsplash

Over the last few years, more companies have begun migrating their data science work to the cloud. As they do, they naturally want to bring along their favorite data science tools, including RStudio, R, and Python. In this blog post, we discuss the various ways RStudio products can help you along that journey.

Why Do Organizations Want to Move to the Cloud?

There are many reasons why organizations are looking to use cloud services more widely for data science. They include:

Let Your Data Science Goals Drive Your Cloud Strategy

Depending on the circumstances of your organization and what specific challenges you are trying to address, you should consider four possible options for your data science cloud strategy:

We’re provided the table below to help you assess the various RStudio cloud offerings. It matches up problems and potential solutions with specific RStudio options and resources to consider. The options are arranged in order of increasing complexity of configuration and administration.

Table 1: Summary of Cloud Options for RStudio Software
Problem Potential Solution Pros and Cons Options to consider
Simplify and reduce startup costs SaaS/Hosted offering
Pros:
  • Simplest and lowest cost to deploy
  • Hardware and software managed by the provider
  • Costs may be fixed, variable or a mix of the two
Cons:
  • Limited integration with your organization’s internal data and security protocols.
  • May not be cost efficient for large groups
  • May have limited options for custom configuration
Create data science analyses with RStudio Cloud
Share Shiny applications with shinyapps.io
Manage packages with RStudio Public Package Manager, a free service to provide easy installation of package binaries, and access to previous package versions
Promote collaboration or instruction between organizations or groups SaaS/Hosted offering
Pros:
  • Same pros as above, plus the ability to easily share projects
Cons:
  • Same cons as above
Share projects or teach classes/workshops with RStudio Cloud
Mitigate high costs of computing infrastructure Marketplace Offerings
Pros:
  • Easy to get started at minimal, pay-as-you-go (hourly) cost.
  • Access to specialized hardware (e.g GPUs)
Cons:
  • To manage hourly costs, careful management is required to ensure software is running only when needed
RStudio products on AWS Marketplace, Azure Marketplace, and Google Cloud Platform.
Deployment to a VPC on a major cloud provider
Pros:
  • Outsources hardware costs
  • Integrates with existing analytic assets on cloud platforms
  • Allows easy customization and configuration
  • Provides access to specialized hardware (e.g GPUs)
  • Ensures data sovereignty by running your processes in a local cloud region
Cons:
  • Complexity of managing software configuration and integration with your organization’s on-premise data and security protocols.
  • Costs may be highly variable, based on usage
Deploy RStudio products in a VPC, using cloud formation templates for AWS and Azure ARM template (See RStudio Cloud Tools)
Deploy RStudio products via Docker e.g. use EKS (Elastic Kubernetes Service) on AWS. (See Docker images for RStudio Professional Products)
Connect to cloud based data storage, such as Redshift or S3.
Scale to meet variable demand Clustering approaches, including Kubernetes
Pros:
  • Cloud-deployed applications can be easily scaled to meet demand, since cloud providers provide container resources on demand.
Cons:
  • Careful management required to avoid unnecessary compute costs, while still matching job requirements to computational needs.
In addition to the points above, RStudio Server Pro’s Launcher integrates with Kubernetes, an industry-standard clustering solution that allows efficient scaling.
RStudio Connect provides many options to scale and tune performance, including being part of an autoscaling group. These options allow Connect to deliver dashboards, Shiny applications, and other types of content to large numbers of users.
Minimize data movement Data lakes
Pros:
  • Run your computations close to the data, minimizing overhead
  • Tie your data science directly into your data pipeline
Cons:
  • Adds additional complexity and potential limitations
Connect to cloud based data storage, such as Redshift or S3.
Managed RStudio Server Pro on Spark and Hadoop on Azure and AWS (Cazena)

Ready to Take RStudio to the Cloud?

If you’d like to take RStudio along on your journey to the cloud, you can start by exploring the resources linked in the table above. We also invite you to join us on December 2 for a webinar, “What does it mean to do data science in the cloud?”, conducted with our partner ProCogia. You can register for the webinar here.

Our product team is also happy to provide advice and guidance along this journey. If you’d like to set up a time to talk with us, you can book a time here. We look forward to being your guide.

To leave a comment for the author, please follow the link and comment on their blog: RStudio | Open source & professional software for data science teams on RStudio.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.