Using .drop_na in Fast Classification and Regression

Steven P. Sanderson II, MPH

10 hours ago

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< section id="introduction" class="level1">

Introduction

In the newest release of tidyAML there has been an addition of a new parameter to the functions fast_classification() and fast_regression(). The parameter is .drop_na and it is a logical value that defaults to TRUE. This parameter is used to determine if the function should drop rows with missing values from the output if a model cannot be built for some reason. Let’s take a look at the function and it’s arguments.

fast_regression(
  .data,
  .rec_obj,
  .parsnip_fns = "all",
  .parsnip_eng = "all",
  .split_type = "initial_split",
  .split_args = NULL,
  .drop_na = TRUE
)

< section id="arguments" class="level2">

Arguments

.data – The data being passed to the function for the regression problem .rec_obj – The recipe object being passed. .parsnip_fns – The default is ‘all’ which will create all possible regression model specifications supported. .parsnip_eng – The default is ‘all’ which will create all possible regression model specifications supported. .split_type – The default is ‘initial_split’, you can pass any type of split supported by rsample .split_args – The default is NULL, when NULL then the default parameters of the split type will be executed for the rsample split type. .drop_na – The default is TRUE, which will drop all NA’s from the data.

Now let’s see this in action.

< section id="example" class="level1">

Example

We are going to use the mtcars dataset for this example. We will create a regression problem where we are trying to predict mpg using all other variables in the dataset. We will not load in all the libraries that are supported causing the function to return NULL for some models and we will set the parameter .drop_na to FALSE.

library(tidyAML)
library(tidymodels)
library(tidyverse)

tidymodels::tidymodels_prefer()

# Create regression problem
rec_obj <- recipe(mpg ~ ., data = mtcars)
frt_tbl <- fast_regression(
  mtcars,
  rec_obj,
  .parsnip_eng = c("lm","glm","gee"),
  .parsnip_fns = "linear_reg",
  .drop_na = FALSE
  )

glimpse(frt_tbl)

Rows: 3
Columns: 8
$ .model_id       <int> 1, 2, 3
$ .parsnip_engine <chr> "lm", "gee", "glm"
$ .parsnip_mode   <chr> "regression", "regression", "regression"
$ .parsnip_fns    <chr> "linear_reg", "linear_reg", "linear_reg"
$ model_spec      <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]…
$ wflw            <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ fitted_wflw     <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ pred_wflw       <list> [<tbl_df[64 x 3]>], <NULL>, [<tbl_df[64 x 3]>]

extract_wflw(frt_tbl, 1:nrow(frt_tbl))

[[1]]
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: lm 


[[2]]
NULL

[[3]]
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: glm

Here we can see that the function returned NULL for the gee model because we did not load in the multilevelmod library. We can also see that the function did not drop that model from the output because .drop_na was set to FALSE. Now let’s set it back to TRUE.

frt_tbl <- fast_regression(
  mtcars,
  rec_obj,
  .parsnip_eng = c("lm","glm","gee"),
  .parsnip_fns = "linear_reg",
  .drop_na = TRUE
  )

glimpse(frt_tbl)

Rows: 2
Columns: 8
$ .model_id       <int> 1, 3
$ .parsnip_engine <chr> "lm", "glm"
$ .parsnip_mode   <chr> "regression", "regression"
$ .parsnip_fns    <chr> "linear_reg", "linear_reg"
$ model_spec      <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]…
$ wflw            <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ fitted_wflw     <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
$ pred_wflw       <list> [<tbl_df[64 x 3]>], [<tbl_df[64 x 3]>]

extract_wflw(frt_tbl, 1:nrow(frt_tbl))

[[1]]
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: lm 


[[2]]
══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: glm

Here we can see that the gee model was dropped from the output because the function could not build the model due to the multilevelmod library not being loaded. This is a great way to drop models that cannot be built due to missing libraries or other reasons.

< section id="conclusion" class="level1">

Conclusion

The .drop_na parameter is a great way to drop models that cannot be built due to missing libraries or other reasons. This is a great addition to the fast_classification() and fast_regression() functions.

Happy coding!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Arguments

Example

Conclusion

Related