tidyAML: Automated Machine Learning with tidymodels

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Welcome to {tidyAML} which is an R package that makes it easy to use the tidymodels ecosystem to perform automated machine learning (AutoML). This package provides a simple and intuitive interface that allows users to quickly generate machine learning models without worrying about the underlying details. It also includes a safety mechanism that ensures that the package will fail gracefully if any required extension packages are not installed on the user’s machine. With {tidyAML}, users can easily build high-quality machine learning models in just a few lines of code. Whether you are a beginner or an experienced machine learning practitioner, {tidyAML} has something to offer.

Some ideas are that we should be able to generate regression models on the fly without having to actually go through the process of building the specification, especially if it is a non-tuning model, meaning we are not planing on tuning hyper-parameters like penalty and cost.

The idea is not to re-write the excellent work the tidymodels team has done (because it’s not possible) but rather to try and make an enhanced easy to use set of functions that do what they say and can generate many models and predictions at once.

This is similar to the great h2o package, but, {tidyAML} does not require java to be setup properly like h2o because {tidyAML} is built on tidymodels.

Installation

You can install {tidyAML} like so:

install.packages("tidyAML")

Or the development version from GitHub

# install.packages("devtools")
devtools::install_github("spsanderson/tidyAML")

Part of the reason to use {tidyAML} is so that you can generate many models of your data set. One way of modeling a data set is using regression for some numeric output. There is a convienent function in tidyAML that will generate a set of non-tuning models for fast regression. Let’s take a look below.

First let’s load the library

library(tidyAML)

Now lets see the function in action.

fast_regression_parsnip_spec_tbl(.parsnip_fns = "linear_reg")
# A tibble: 11 × 5
   .model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec
       <int> <chr>           <chr>         <chr>        <list>    
 1         1 lm              regression    linear_reg   <spec[+]> 
 2         2 brulee          regression    linear_reg   <spec[+]> 
 3         3 gee             regression    linear_reg   <spec[+]> 
 4         4 glm             regression    linear_reg   <spec[+]> 
 5         5 glmer           regression    linear_reg   <spec[+]> 
 6         6 glmnet          regression    linear_reg   <spec[+]> 
 7         7 gls             regression    linear_reg   <spec[+]> 
 8         8 lme             regression    linear_reg   <spec[+]> 
 9         9 lmer            regression    linear_reg   <spec[+]> 
10        10 stan            regression    linear_reg   <spec[+]> 
11        11 stan_glmer      regression    linear_reg   <spec[+]> 
fast_regression_parsnip_spec_tbl(.parsnip_eng = c("lm","glm"))
# A tibble: 3 × 5
  .model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec
      <int> <chr>           <chr>         <chr>        <list>    
1         1 lm              regression    linear_reg   <spec[+]> 
2         2 glm             regression    linear_reg   <spec[+]> 
3         3 glm             regression    poisson_reg  <spec[+]> 
fast_regression_parsnip_spec_tbl(.parsnip_eng = c("lm","glm","gee"), 
                                 .parsnip_fns = "linear_reg")
# A tibble: 3 × 5
  .model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec
      <int> <chr>           <chr>         <chr>        <list>    
1         1 lm              regression    linear_reg   <spec[+]> 
2         2 gee             regression    linear_reg   <spec[+]> 
3         3 glm             regression    linear_reg   <spec[+]> 

As shown we can easily select the models we want either by choosing the supported parsnip function like linear_reg() or by choose the desired engine, you can also use them both in conjunction with each other!

This function also does add a class to the output. Let’s see it.

class(fast_regression_parsnip_spec_tbl())
[1] "tidyaml_mod_spec_tbl" "fst_reg_spec_tbl"     "tidyaml_base_tbl"    
[4] "tbl_df"               "tbl"                  "data.frame"          

We see that there are two added classes, first fst_reg_spec_tbl because this creates a set of non-tuning regression models and then tidyaml_mod_spec_tbl because this is a model specification tibble built with {tidyAML}

Now, what if you want to create a non-tuning model spec without using the fast_regression_parsnip_spec_tbl() function. Well, you can. The function is called create_model_spec().

create_model_spec(
 .parsnip_eng = list("lm","glm","glmnet","cubist"),
 .parsnip_fns = list(
      "linear_reg",
      "linear_reg",
      "linear_reg",
      "cubist_rules"
     )
 )
# A tibble: 4 × 4
  .parsnip_engine .parsnip_mode .parsnip_fns .model_spec
  <chr>           <chr>         <chr>        <list>     
1 lm              regression    linear_reg   <spec[+]>  
2 glm             regression    linear_reg   <spec[+]>  
3 glmnet          regression    linear_reg   <spec[+]>  
4 cubist          regression    cubist_rules <spec[+]>  
create_model_spec(
 .parsnip_eng = list("lm","glm","glmnet","cubist"),
 .parsnip_fns = list(
      "linear_reg",
      "linear_reg",
      "linear_reg",
      "cubist_rules"
     ),
 .return_tibble = FALSE
 )
$.parsnip_engine
$.parsnip_engine[[1]]
[1] "lm"

$.parsnip_engine[[2]]
[1] "glm"

$.parsnip_engine[[3]]
[1] "glmnet"

$.parsnip_engine[[4]]
[1] "cubist"


$.parsnip_mode
$.parsnip_mode[[1]]
[1] "regression"


$.parsnip_fns
$.parsnip_fns[[1]]
[1] "linear_reg"

$.parsnip_fns[[2]]
[1] "linear_reg"

$.parsnip_fns[[3]]
[1] "linear_reg"

$.parsnip_fns[[4]]
[1] "cubist_rules"


$.model_spec
$.model_spec[[1]]
Linear Regression Model Specification (regression)

Computational engine: lm 


$.model_spec[[2]]
Linear Regression Model Specification (regression)

Computational engine: glm 


$.model_spec[[3]]
Linear Regression Model Specification (regression)

Computational engine: glmnet 


$.model_spec[[4]]
! parsnip could not locate an implementation for `cubist_rules` regression
  model specifications using the `cubist` engine.
Cubist Model Specification (regression)

Computational engine: cubist 

The first example shows the output as a tibble, the second example shows the output as a list of model specs. The last one for cubist rules also shows how it will gracefully fail if the package is not loaded.


Happy Coding!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)