Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
H2O is the scalable, open-source Machine Learning library that features AutoML
. Here are 5 Reasons why it’s an essential library for creating production data science code.
Full-Stack Data Science Series
This is part of a series of articles on essential Data Science and Web Application skills for 2020 and beyond:
- Part 1 – 5 Full-Stack Data Science Technologies for 2020 (and Beyond)
- Part 2 – AWS Cloud
- Part 3 – Docker
- Part 4 – Git Version Control
- Part 5 – H2O Automated Machine Learning (AutoML)
- Part 6 – Shiny Web Applications (Coming Soon)
- [NEW BOOK] – Shiny Production with AWS, Docker, Git Book
Machine Learning
Up 440% vs 5-Years Ago
Before I jump into H2O, let’s first understand the demand for ML. The 5-year trends in Technology Job Postings show a 440% increase in “Machine Learning” skills being requested, capturing a 7% share in all technology-related job postings.
Not just “Data Scientist” Jobs… ALL Technology Jobs.
Top 20 Tech Skills 2014-2019
Source: Indeed Hiring Lab.
My point: Learning ML is essential
We can safely say that if you are in a technology job (or seeking one) then you need to learn how to apply AI and Machine Learning to solve business problems.
The problem: There are a dozen machine learning and deep learning frameworks – TensorFlow
, Scikit-Learn
, H2O
, MLR3
, PyTorch
, … These all take time and effort to learn. So, which framework should you learn for business?
Why I use and recommend H2O: H2O has singlehandedly produced results in hours that would have otherwise taken days or weeks. I recommend learning H2O
for applying Machine Learning to business data. I’ve been using H2O for several years now on both consulting projects and teaching it to clients. I have 5 reasons that explain how I have gotten this productivity enhancement using H2O on my business projects.
5-Reasons why I use and teach H2O
My Top 5-Reasons why I use and recommend learning H2O
.
1. AutoML
Massive Productivity Booster
H2O AutoML
automates the machine learning workflow, which includes automatic training and tuning of many models. This allows you to spend your time on more important tasks like feature engineering and understanding the problem.
Me holding my H2O AutoML Hex Sticker
H2O is my go-to for production ML
2. Scalable on Local Compute
Distributed, In-Memory Processing speeds up computations
In-memory processing with fast serialization between nodes and clusters to support massive datasets enables problems that traditionally need bigger tools to be solved in-memory on your local computer.
3. Spark Integration & GPU Support
Big Data
- H2O’s Spark integration (Sparkling Water) enables distributed processing on Big Data.
- H2O4GPU enables running H2O’s R and Python libraries using GPUs.
The result is 100x faster training than traditional ML.
rsparkling – The Spark + H2O Big Data Solution
4. Best Algorithms, Optimized and Ensembled
Superior Performance
H2O’s algorithms are developed from the ground up for distributed computing. The most popular algorithms are incorporated including:
- XGBoost
- GBM
- GLM
- Random Forest
- and more.
AutoML
ensembles (combines) these models to provide superior performance.
5. Production Ready
Docker Containers
I love using Docker (learn why) + H2O
to integrate AutoML
models into Shiny
Web Applications. H2O is built on (and depends on) Java, which traditionally creates overhead. But, with H2O Docker Images, it makes deploying H2O Models super easy with all necessary software inside the pre-built Docker Image.
H2O in Production
H2O
can be integrated into Shiny
Applications like this one – an Employee Attrition Prediction & Prevention App.
Employee Attrition Prevention App
(Course coming to BSU soon)
H2O is the underlying prediction technology
You need to learn H2O AutoML to build the Employee Attrition Shiny App. H2O AutoML
generates the “Employee Attrition Machine Learning Model” that scores the employees based on features like tenure, over time, stock option level, etc.
H2O AutoML – Employee Attrition Machine Learning Model
Built in DS4B 201-R Course
The H2O Course
If you are ready to learn H2O AutoML
along with critical supporting technologies and data science workflow processes that follow an enterprise-grade system, then look no further: DS4B 201-R (Advanced Machine Learning & Business Consulting Course).
You follow a 10-week program for solving Business Problems with Data Science that teaches each of the tools needed to solve a $15M/year employee attrition problem using Machine Learning (H2O
), Explainable ML (LIME
), and Optimization (purrr
).
10-Week System for Solving Business Problems with Machine Learning
DS4B 201-R Course
In weeks 5 & 6, you learn H2O AutoML
in-depth as part of your learning journey.
Learn H2O AutoML – Weeks 5 and 6
DS4B 201-R Course
No Machine Learning Experience?
Don’t worry. You’re covered.
You are probably thinking, “How do I learn H2O if I have no Machine Learning background or coding experience?”
That’s why I created the 4-Course R-Track Program.
Go from beginner to expert in 6-months or less with no prior experience required.
You learn:
- Data Science Foundations
- Advanced Machine Learning & Business Consulting –
H2O AutoML
- Shiny Dashboards
- Shiny Developer with AWS (NEW)
I look forward to providing you the best data science for business education.
Matt Dancho
Founder, Business Science
Lead Data Science Instructor, Business Science University
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.