Site icon R-bloggers

5 Data Science Technologies for 2020 (and Beyond)

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Moving into 2020, three things are clear – Organizations want Data Science, Cloud, and Apps. Here are the Top 5 essential skills for Data Scientists that need to build and deploy applications in 2020 and beyond.

This is part of a series of articles on Data Science key skills for 2020:

Top 20 Tech Skills 2014-2019

Indeed, the popular employment-related search engine, released an article showing changing trends from 2015 to 2019 in “Technology-Related Job Postings” examining the 5-Year Change of the most requested technology skills.

Top 20 Tech Skills 2014-2019
Source: Indeed Hiring Lab.

I’m generally not a big fan of these reports because the technology landscape changes so quickly. But, I was pleasantly surprised at the length of time from the analysis – Indeed looked at changes over a 5-year period, which gives a much better sense of the long term trends.

Why No R, Shiny, Tableau PowerBI, Alteryx?

The skills reported are not “Data Science”-specific (which is why you don’t see R, Tableau, PowerBI, Alteryx, on the list).

However, we can glean insights based on the technologies present…

Cloud, Machine Learning, Apps Driving Growth

From the technology growth, it’s clear that Businesses need Cloud + ML + Apps.

Technologies Driving Tech Skill Growth

My Takeaway

This assessment has led me to my key technologies for Data Scientists heading into 2020. I focus on key technologies related to Cloud + ML + Apps.

Top 5 Data Science Technologies for Cloud + ML + Apps

That Data Scientists should learn for 2020 and beyond – these are geared towards the Business Demands: Cloud + ML + Apps. In other words, businesses need data-science and machine learning-powered web applications deployed into production via the Cloud.

< mark>Here’s what you need to learn to build ML-Powered Web Applications and deploy in the Cloud.

*Note that R and Python are skills that you should be learning before you jump into these.

5 Key Data Science Technologies for Cloud + Machine Learning + Applications

1. AWS Cloud Services

The most popular cloud service provider. EC2 is a staple for apps, running jupyter/rstudio in the cloud, and leveraging cloud resources rather than investing in expensive computers & servers.

AWS Resource: AWS for Data Science Apps – 14% Share, 400% Growth

2. Shiny Web Apps

A comprehensive web framework designed for data scientists with a rich ecosystem of extension libraries (dubbed the “shinyverse”).

Shiny Resource (Coming Soon): Shiny Data Science Web Applications

3. H2O Machine Learning

Automated machine learning library available in Python and R. Works well on structured data (format for 95% of business problems). Automation drastically increases productivity in machine learning.

H2O Resource (Coming Soon): H2O Automated Machine Learning (AutoML)

4. Docker for Web Apps

Creating docker environments drastically reduces the risk of software incompatibility in production. DockerHub makes it easy to share your environment with other Data Scientists or DevOps. Further, Docker and DockerHub make it easy to deploy applications into production.

Docker Resource: Docker for Data Science Apps – 4000% Growth

5. Git Version Control

Git and GitHub are staples for reproducible research and web application development. Git tracks past versions and enables software upgrades to be performed on branches. GitHub makes it easy to share your research and/or web applications with other Data Scientists, DevOps, or Data Engineering. Further, Git and GitHub make it easy to deploy changes to apps in production.

Git Resource (Coming Soon): Git Version Control for Data Science Apps

Other Technologies Worth Mentioning

  1. dbplyr for SQL – For data scientists that need to create complex SQL queries, but don’t have time to deal with messy SQL. dbplyr is a massive productivity booster. It converts R (dplyr) to SQL. Can use it for 95% of SQL queries.

  2. Bootstrap – For data scientists that build apps, Bootstrap is a Front-End web framework that Shiny is built on top of and it powers much of the web (e.g. Twitter’s app). Bootstrap makes it easy to control the User Interface (UI) of your application.

  3. MongoDB – For data scientists that build apps, MongoDB is a NoSQL database that is useful for storing complex user information of your application in one table. Much easier than creating a multi-table SQL database.

Real Shiny App + AWS + Docker Case Example

In my Shiny Developer with AWS Course (NEW), you use the following application architecture that uses AWS EC2 to create an Ubuntu Linux Server that hosts a Shiny App in the cloud called the Stock Analyzer.

Data Science Web Application Architecture
From Shiny Developer with AWS Course

You use AWS EC2 to build a server to run your Stock Analyzer application along with several other web apps.

AWS EC2 Instance used for Cloud Deployment
From Shiny Developer with AWS Course

Next, you use a DockerFile to containerize the application’s software environment.

DockerFile for Stock Analyzer App
From Shiny Developer with AWS Course

You then deploy your “Stock Analyzer” application so it’s accessible anywhere via the AWS Cloud.

DockerFile for Stock Analyzer App
From Shiny Developer with AWS Course

If you are ready to learn how to build and deploy Shiny Applications in the cloud using AWS, then I recommend my NEW 4-Course R-Track System.



I look forward to providing you the best data science for business education.

Matt Dancho

Founder, Business Science

Lead Data Science Instructor, Business Science University

To leave a comment for the author, please follow the link and comment on their blog: business-science.io.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.