Site icon R-bloggers

Exploring the Enhanced Features of tidyAML’s internal_make_wflw_predictions()

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< section id="introduction" class="level1">

Introduction

Hey R enthusiasts! Steve here, and today I’m excited to share some fantastic updates about a key function in the tidyAML package – internal_make_wflw_predictions(). The latest version addresses issue #190, ensuring that all crucial data is now included in the predictions. Let’s dive into the details!

< section id="whats-new" class="level1">

What’s New?

In response to user feedback, we’ve enhanced the internal_make_wflw_predictions() function to provide a comprehensive set of predictions. Now, when you make a call to this function, it includes:

  1. The Actual Data: This is the real-world data that your model aims to predict. Having access to this information helps you assess how well your model is performing on unseen instances.

  2. Training Predictions: Predictions made on the training dataset. This is essential for understanding how well your model generalizes to the data it was trained on.

  3. Testing Predictions: Predictions made on the testing dataset. This is crucial for evaluating the model’s performance on data it hasn’t seen during the training phase.

< section id="how-to-use-it" class="level1">

How to Use It

To take advantage of these new features, here’s how you can use the updated internal_make_wflw_predictions() function:

internal_make_wflw_predictions(.model_tbl, .splits_obj)
< section id="arguments" class="level2">

Arguments:

  1. .model_tbl: The model table generated from a function like fast_regression_parsnip_spec_tbl(). Ensure that it has a class of “tidyaml_mod_spec_tbl.” This is typically used after running the internal_make_fitted_wflw() function and saving the resulting tibble.

  2. .splits_obj: The splits object obtained from the auto_ml function. It is internal to the auto_ml function.

< section id="example-usage" class="level1">

Example Usage

Let’s walk through an example using some popular R packages:

library(tidymodels)
library(tidyAML)
library(tidyverse)
tidymodels_prefer()

# Create a model specification table
mod_spec_tbl <- fast_regression_parsnip_spec_tbl(
  .parsnip_eng = c("lm","glm"),
  .parsnip_fns = "linear_reg"
)

# Create a recipe
rec_obj <- recipe(mpg ~ ., data = mtcars)

# Create splits
splits_obj <- create_splits(mtcars, "initial_split")

# Generate the model table
mod_tbl <- mod_spec_tbl |>
  mutate(wflw = full_internal_make_wflw(mod_spec_tbl, rec_obj))

# Generate the fitted model table
mod_fitted_tbl <- mod_tbl |>
  mutate(fitted_wflw = internal_make_fitted_wflw(mod_tbl, splits_obj))

# Make predictions with the enhanced function
preds_list <- internal_make_wflw_predictions(mod_fitted_tbl, splits_obj)

This example demonstrates how to integrate the updated function into your workflow seamlessly. Typically though one would not use this function directly, but rather use the fast_regression() or fast_classification() function, which calls this function internally. Let’s now take a look at the output of everything.

rec_obj
── Recipe ──────────────────────────────────────────────────────────────────────
── Inputs 
Number of variables by role
outcome:    1
predictor: 10
splits_obj
$splits
<Training/Testing/Total>
<24/8/32>

$split_type
[1] "initial_split"
mod_spec_tbl
# A tibble: 2 × 5
  .model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec
      <int> <chr>           <chr>         <chr>        <list>    
1         1 lm              regression    linear_reg   <spec[+]> 
2         2 glm             regression    linear_reg   <spec[+]> 
mod_tbl
# A tibble: 2 × 6
  .model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec wflw      
      <int> <chr>           <chr>         <chr>        <list>     <list>    
1         1 lm              regression    linear_reg   <spec[+]>  <workflow>
2         2 glm             regression    linear_reg   <spec[+]>  <workflow>
mod_fitted_tbl
# A tibble: 2 × 7
  .model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec wflw      
      <int> <chr>           <chr>         <chr>        <list>     <list>    
1         1 lm              regression    linear_reg   <spec[+]>  <workflow>
2         2 glm             regression    linear_reg   <spec[+]>  <workflow>
# ℹ 1 more variable: fitted_wflw <list>
preds_list
[[1]]
# A tibble: 64 × 3
   .data_category .data_type .value
   <chr>          <chr>       <dbl>
 1 actual         actual       15.2
 2 actual         actual       19.7
 3 actual         actual       17.8
 4 actual         actual       15  
 5 actual         actual       10.4
 6 actual         actual       15.8
 7 actual         actual       17.3
 8 actual         actual       30.4
 9 actual         actual       15.2
10 actual         actual       19.2
# ℹ 54 more rows

[[2]]
# A tibble: 64 × 3
   .data_category .data_type .value
   <chr>          <chr>       <dbl>
 1 actual         actual       15.2
 2 actual         actual       19.7
 3 actual         actual       17.8
 4 actual         actual       15  
 5 actual         actual       10.4
 6 actual         actual       15.8
 7 actual         actual       17.3
 8 actual         actual       30.4
 9 actual         actual       15.2
10 actual         actual       19.2
# ℹ 54 more rows

You will notice the names of the preds_list output:

names(preds_list[[1]])
[1] ".data_category" ".data_type"     ".value"        

So we have .data_category, .data_type, and .value. Let’s take a look at the unique values of each column for .data_category and .data_type:

unique(preds_list[[1]]$.data_category)
[1] "actual"    "predicted"

So we have our actual data the the predicted data. The predicted though has both the training and testing data in it. Let’s take a look at the unique values of .data_type:

unique(preds_list[[1]]$.data_type)
[1] "actual"   "training" "testing" 

This will allow you to visualize the data how you please, something we will go over tomorrow!

< section id="why-it-matters" class="level2">

Why It Matters

By including actual data along with training and testing predictions, the internal_make_wflw_predictions() function empowers you to perform a more thorough evaluation of your models. This is a significant step towards ensuring the reliability and generalization capability of your machine learning models.

So, R enthusiasts, update your tidyAML package, explore the enhanced features, and let us know how these improvements elevate your modeling experience. Happy coding!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version