Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last November, our data science team embarked on a journey to build the ultimate Data Science For Business (DS4B) learning platform. We saw a problem: A gap exists in organizations between the data science team and the business. To bridge this gap, we’ve created Business Science University, an online learning platform that teaches DS4B, using high-end machine learning algorithms, and organized in the fashion of an on-premise workshop but at a fraction of the price. I’m pleased to announce that, in 5 days, we will launch our first course, HR 201, as part of a 4-course Virtual Workshop. We crafted the Virtual Workshop after the data science program that we wished we had when we began data science (after we got through the basics of course!). Now, our data science process is being opened up to you. We guide you through our process for solving high impact business problems with data science!
Highlights
- A major benefit to the Virtual Workshop is that: We teach our internally developed systematic process, the Business Science Problem Framework (BSPF). We use this process to solve high impact problems, tying data science to financial benefit. Below is the BSPF, which is one of the tools that has been instrumental to our success. In Data Science For Business (HR 201), we follow the BSPF throughout the course, showing you how to apply the framework to a data science project.
-
Another benefit is that you get to see our process for dissecting and analyzing difficult problems. We show you how to tie financial impact to the problem, which is critical in gaining organizational acceptance of a data science project.
-
Yet another benefit is you will learn how to code within the
tidyverse
, and specifically using Tidy Eval for programming withdplyr
and othertidyverse
packages. -
And finally, one more benefit is you will spend a sizable chunk of time using:
tidyverse
,h2o
,lime
,recipes
,GGally
,skimr
, and more!
The Course Overview touches on the content. Take a look and let us know what you think!
< !-- ## Highlights --> < !-- Here's some highlights. See the [Course Overview](#course-overview) for more details. -->Course Overview
We show you how we use data science to solve high impact problems using proven methodologies and tying data science to financial benefit to the organization.
Data Science For Business (HR 201) is the first course in a 4-part Virtual Workshop that focuses on a $15M/year problem1 that’s hidden from the organization: Employee Turnover. We use a real-world problem to show you how tools like the Business Science Problem Framework and advanced Machine learning algorithms like H2O and LIME can solve this problem, saving the organization millions in the process. Just think, a 10% reduction could save $1.5M/year. That’s the power of data science!
Data Science For Business, HR 201
Chapter 0: Getting Started
- Data Science Project Setup
- The True Cost of Employee Attrition
- What Tools Are in Our Toolbox?
- Frameworks
In this chapter, we introduce you to our systematic process using the Business Science Problem Framework (BSPF), which augments CRISP-DM. The BSPF focuses on problem understanding and business outcomes on a detailed level whereas CRISP-DM contains the tools necessary for high-level data science project management. Combined, they create one of the tools that has been instrumental to our success.
< !--Business Science Problem Framework
Chapter 1: Business Understanding
- Problem Understanding With BSPF
- Streamlining The Attrition Code Workflow
- Visualizing Attrition with
ggplot2
- Making A Custom Plotting Function:
plot_attrition()
- Challenge 1: Cost Of Attrition
This chapter kicks off CRISP-DM Stage 1 along with BSPF Stages 1-4. You will understand the business problem assigning a financial cost to employee turnover. We develop custom functions to enable visualizing attrition cost by department and job role. These functions are later developed into an R package, tidyattrition
, as part of HR 303. We cap it off by developing a custom plotting function, plot_attrition()
, that generates an impactful visualization for executives to see the value of your data science project.
Visualizing Attrition Cost
Chapter 2: Data Understanding
- EDA Part 1: Exploring Data By Data Type With
skimr
- EDA Part 2: Visualizing Feature-Target Interactions with
GGally
- Challenge 2: Assessing Feature Pairs
In this chapter, we focus on two methods of exploratory data analysis (EDA) to gain a thorough understanding of the features. First, we tackle our problem by data type with skimr
, separating categorical data from numeric. Second, we visualize interactions using GGally
.
Chapter 3: Data Preparation
- Data Preparation For People (Humans)
- Data Preparation For Machines With
recipes
Next, we process the data for both people and machines. We make extensive use of the recipes
package to properly transform data for a pre-modeling Correlation Analysis.
Chapter 4: Automated Machine Learning With H2O
- Building A Classifier With
h2o
Automated Machine Learning - Inspecting the H2O Leaderboard
- Building A Custom Leaderboard Plotting Function:
plot_h2o_leaderboard()
- Extracting Models
- Making Predictions
Building a high accuracy model is the goal with this stage. We show how to run h2o
automated machine learning. We also detail how to build a custom plotting function, plot_h2o_leaderboard()
to visualize the best models and select them for work on a hold out (testing) set.
Custom H2O Leaderboard Visualization
Chapter 5: Assessing H2O Performance
- Classifier Summary Metrics
- Precision & Recall: Adjusting The Classifier Threshold
- Classifier Gain and Lift: Charts For Exec’s
- Visualizing Performance
- Making A Custom H2O Performance Plot:
plot_h2o_performance()
In this chapter, we show you how to assess performance and visualize model quality in a way that executives and other business decision makers understand.
Chapter 6: Explaining Black-Box Models With LIME
- Using
lime
For Local Model Explanations - Making An Explainer
- Explaining Multiple Cases
We use lime
to explain the black-box classification model showing which features drive whether the employee stays or leaves.
LIME Feature Explanation Visualization
Chapter 7: Recommendation Algorithm
Finally, we put our data science investigative skills to use developing a recommendation algorithm that helps managers and executives make better decisions to prevent employee turnover. This recommendation algorithm is used in HR 301 to build a Machine-Learning powered shiny
Web Application that can be deployed to executives and managers.
HR 301 Shiny App: Management Strategies
HR 301 Shiny App: Attrition Risk
Timing
The HR 201 course will be opened on Monday (4/30). A special offer will be provided to those that enroll in BSU early. The course will not be visible until Monday when it’s released.
What You Need
All you need is a basic proficiency in R programming. A basic (novice) knowledge of R, dplyr
, and ggplot2
is our expectation. We’ll take care of the rest. If you are unsure, there is a proficiency quiz to check your baseline. Also, there’s a 30-day money-back guarantee if the course is too difficult or if you are not completely satisfied.
Education Assistance
Many employers offer education assistance to cover the cost of courses. Begin discussions with your employer immediately if this is available to you and you are interested in this course. They will benefit BIG TIME from you taking this course. The special offer we send out is available for a limited time only!
Enroll Now
Enrollment in BSU is open already. Enroll now to take advantage of a special offer. The course will open on Monday, and I will send an announcement to those that are enrolled in BSU along with the special offer. Time is limited.
About Business Science
Business Science specializes in “ROI-driven data science”. We offer training, education, coding expertise, and data science consulting related to business and finance. Our latest creation is Business Science University, a Virtual Workshop that is self-paced and teaches you our data science process! In addition, we deliver about 80% of our effort into the open source data science community in the form of software and our Business Science blog. Visit Business Science on the web or contact us to learn more!
Don’t Miss A Beat
- Sign up for the Business Science blog to stay updated!
- Enroll in Business Science University to learn how to solve real-world data science problems from Business Science!
- Check out our Open Source Software!
Connect With Business Science
If you like our software (anomalize
, tidyquant
, tibbletime
, timetk
, and sweep
), our courses, and our company, you can connect with us:
- business-science on GitHub!
- Business Science, LLC on LinkedIn!
- bizScienc on twitter!
- Business Science, LLC on Facebook!
Footnotes
-
An organization that loses 200+ high performers per year can lose an estimated $15M/year in hidden costs primarily associated with productivity. We show you how to calculate this cost in Chapter 1: Business Understanding. ↩
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.