Git for Data Science Applications (A Top Skill for 2020)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Moving into 2020, three things are clear – Organizations want Data Science, Cloud, and Apps. A key skill that companies need is Git
for application development (I call this Full Stack Data Science). Here’s what is driving Git’s growth, and why you should learn Git for data science application development.
This is part of a series of articles on Data Science key skills for 2020:
- 5 Data Science Technologies for 2020 (and Beyond)
- AWS Cloud – 14% Share, 400% Growth
- Docker – 4000% Growth
- Git Version Control – 8% Share, 150% Growth
- Shiny Web Applications (Coming Soon)
- H2O Automated Machine Learning (AutoML) (Coming Soon)
Top 20 Tech Skills 2014-2019
Indeed, the popular employment-related search engine, released an article showing changing trends from 2014 to 2019 in “Technology-Related Job Postings” examining the 5-Year Change of the most requested technology skills.
Top 20 Tech Skills 2014-2019
Source: Indeed Hiring Lab.
I’m generally not a big fan of these reports because the technology landscape changes so quickly. But, I was pleasantly surprised at the length of time from the analysis – Indeed looked at changes over a 5-year period, which gives a much better sense of the long term trends.
Cloud, Machine Learning, Apps Driving Growth
3 Technology Trends show that organizations are transitioning from Business Reporting to Application Development (Read 5 Data Science Technologies for 2020 (and Beyond) for more insights on Key Skills for Data Science and App Development):
-
Cloud – AWS (14% Share, 400% Growth) and Azure (1100% Growth)
-
Machine Learning – Machine Learning (400% Growth), Python (18% Share, 123% Growth)
-
Applications – Git (8% Share, 150% Growth), Docker (4000% Growth)
The changing business needs is challenging Data Scientists to learn new technologies for Data Science Application Development… And, Git
and Docker
are the future for app development.
Git & Docker Trends
We can see that both Git
and Docker
are experiencing explosive, multi-year growth trends in “Google Search Interest”, further supporting the need to learn these key technologies that drive application development. (Read Docker for Data Science Applications (4000% Growth) to learn about how Docker
helps facilitate data science applications.)
What Is Git?
Let’s look at a (Shiny
) web application to see what Git
does and how it helps.
Git Workflow
From Shiny Developer with AWS Course
Git
and GitHub
facilitate a workflow for developing and deploying applications:
-
Application Development begins locally (Local Repository) on your computer. Changes are tracked with
Git
. -
Code is pushed to
GitHub
, a Remote Repository designed for sharing version controlled files. -
The remote repository can be cloned to an
AWS EC2 Instance
, which is a Host for the production application.
Git Version Control
The most important concept of git
is version control. Let’s dive into the application to see how git
helps.
We can see that application consists of 2 things:
-
Files (
Git
Control – The set of instructions for the app. For a Shiny App this includes an app.R file that contains layout instructions, server control instructions, database instructions, etc -
Software (
Docker
Control) – The code external to your files that your application files depend on. For a Shiny App, this is R, Shiny Server, and any libraries your app uses.
Git
applies version control to the files. This is a lifeline in case you make a change that adversely impacts production. You can always go backwards.
Git Commands
Version Control Status & Git
Command Workflow. When a codebase has git
initialized, the files are untracked in your Working Directory. As changes are made, the user wants to track these changes. We track them using git commands.
Git commands change the status by moving files through the version control workflow. The most important commands are:
-
commit
– This is when a snapshot of the file is added to your local repository. You can always go back to this version. -
push
– To push any committed files from a local repo (e.g. your computer) to a remote repo (e.g. GitHub) -
pull
– To pull down files on a remote repository to your local computer -
reset
– To undo a change to a committed file
Real Shiny App + AWS + Git Example
In my Shiny Developer with AWS Course (NEW), you use the following application architecture that uses AWS EC2
to create an Ubuntu Linux Server
that hosts a Shiny
App in the cloud called the Stock Analyzer.
Data Science Web Application Architecture
From Shiny Developer with AWS Course
We use Git
to track our files as we move into Production. Here’s an example of the files stored on GitHub in a Private Repo.
GitHub Repository for Stock Analzyer
From Shiny Developer with AWS Course
You then deploy your “Stock Analyzer” application into Production so it’s accessible anywhere via the AWS Cloud via AWS EC2 Instance.
Stock Analyzer App
From Shiny Developer with AWS Course
If you are ready to learn how to build and deploy Shiny
Applications in the cloud using AWS
, then I recommend my NEW 4-Course R-Track System.
I look forward to providing you the best data science for business education.
Matt Dancho
Founder, Business Science
Lead Data Science Instructor, Business Science University
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.