tidyAML: Now supporting gee models
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
I am happy to announce that a new version of tidyAML
is now available on CRAN. This version includes support for gee
models. This is a big step forward for tidyAML
as it now supports a wide variety of regression and classification models.
What is tidyAML
?
The package is meant to be a way to quickly create many models of different algorithms, and this release is a small step forward in supporting that mission. The package is built upon the tidymodels
packages.
What is gee
?
gee
stands for Generalized Estimating Equations. It is a way to model correlated data. For example, if you have a dataset where you have multiple observations per person, you may want to use gee
to account for the correlation between observations within a person. This is a very common situation in longitudinal studies. Here is a link to the gee
parsnip model.
Example
Let’s see how it works, we will examine how it is setup to faily gracefully and how it works when the multilevelmod
package is installed and loaded into your environment.
Load Library
library(tidyAML)
Now, let’s build a model that will fail, it’s important I think to see the failure message so you can understand what is happening. It’s likely because the library is not loaded, let’s face it, it has happened to all of us.
library(recipes) library(dplyr) rec_obj <- recipe(mpg ~ ., data = mtcars) frt_tbl <- fast_regression( mtcars, rec_obj, .parsnip_eng = c("lm","glm","gee"), .parsnip_fns = "linear_reg" ) glimpse(frt_tbl)
Rows: 3 Columns: 8 $ .model_id <int> 1, 2, 3 $ .parsnip_engine <chr> "lm", "gee", "glm" $ .parsnip_mode <chr> "regression", "regression", "regression" $ .parsnip_fns <chr> "linear_reg", "linear_reg", "linear_reg" $ model_spec <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]… $ wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp… $ fitted_wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp… $ pred_wflw <list> [<tbl_df[8 x 1]>], <NULL>, [<tbl_df[8 x 1]>]
frt_tbl |> pull(pred_wflw)
[[1]] # A tibble: 8 × 1 .pred <dbl> 1 22.9 2 18.0 3 21.1 4 32.9 5 18.5 6 10.4 7 9.59 8 24.7 [[2]] NULL [[3]] # A tibble: 8 × 1 .pred <dbl> 1 22.9 2 18.0 3 21.1 4 32.9 5 18.5 6 10.4 7 9.59 8 24.7
frt_tbl |> pull(fitted_wflw) |> purrr::map(broom::tidy)
[[1]] # A tibble: 11 × 5 term estimate std.error statistic p.value <chr> <dbl> <dbl> <dbl> <dbl> 1 (Intercept) -40.3 30.5 -1.32 0.209 2 cyl 0.907 1.09 0.830 0.421 3 disp 0.0105 0.0189 0.557 0.587 4 hp -0.00487 0.0248 -0.196 0.847 5 drat 1.73 2.04 0.846 0.413 6 wt -5.71 2.35 -2.43 0.0302 7 qsec 3.39 1.34 2.54 0.0248 8 vs -3.85 2.99 -1.29 0.220 9 am 2.16 2.29 0.942 0.364 10 gear 1.40 1.68 0.838 0.417 11 carb 0.200 0.835 0.239 0.815 [[2]] # A tibble: 0 × 0 [[3]] # A tibble: 11 × 5 term estimate std.error statistic p.value <chr> <dbl> <dbl> <dbl> <dbl> 1 (Intercept) -40.3 30.5 -1.32 0.209 2 cyl 0.907 1.09 0.830 0.421 3 disp 0.0105 0.0189 0.557 0.587 4 hp -0.00487 0.0248 -0.196 0.847 5 drat 1.73 2.04 0.846 0.413 6 wt -5.71 2.35 -2.43 0.0302 7 qsec 3.39 1.34 2.54 0.0248 8 vs -3.85 2.99 -1.29 0.220 9 am 2.16 2.29 0.942 0.364 10 gear 1.40 1.68 0.838 0.417 11 carb 0.200 0.835 0.239 0.815
We see that the gee
model failed. This is because we did not load the multilevelmod
package. Let’s load it and try again.
library(multilevelmod) frt_tbl <- fast_regression( mtcars, rec_obj, .parsnip_eng = c("lm","glm","gee"), .parsnip_fns = "linear_reg" ) glimpse(frt_tbl)
Rows: 3 Columns: 8 $ .model_id <int> 1, 2, 3 $ .parsnip_engine <chr> "lm", "gee", "glm" $ .parsnip_mode <chr> "regression", "regression", "regression" $ .parsnip_fns <chr> "linear_reg", "linear_reg", "linear_reg" $ model_spec <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]… $ wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp… $ fitted_wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp… $ pred_wflw <list> [<tbl_df[8 x 1]>], [<tbl_df[8 x 1]>], [<tbl_df[8 x 1]>…
frt_tbl |> pull(pred_wflw)
[[1]] # A tibble: 8 × 1 .pred <dbl> 1 20.6 2 15.6 3 15.8 4 26.1 5 27.8 6 16.6 7 25.4 8 21.6 [[2]] # A tibble: 8 × 1 .pred <dbl> 1 21.3 2 15.2 3 15.3 4 25.7 5 27.3 6 16.5 7 25.7 8 20.5 [[3]] # A tibble: 8 × 1 .pred <dbl> 1 20.6 2 15.6 3 15.8 4 26.1 5 27.8 6 16.6 7 25.4 8 21.6
frt_tbl |> pull(fitted_wflw) |> purrr::map(broom::tidy)
[[1]] # A tibble: 11 × 5 term estimate std.error statistic p.value <chr> <dbl> <dbl> <dbl> <dbl> 1 (Intercept) -3.70 18.9 -0.195 0.848 2 cyl 0.668 1.03 0.647 0.529 3 disp 0.00831 0.0153 0.544 0.596 4 hp -0.0124 0.0173 -0.717 0.486 5 drat 2.87 1.44 2.00 0.0674 6 wt -2.87 1.40 -2.05 0.0609 7 qsec 0.777 0.618 1.26 0.231 8 vs 0.169 1.64 0.103 0.920 9 am 1.90 1.79 1.07 0.306 10 gear 1.31 1.44 0.907 0.381 11 carb -0.601 0.730 -0.823 0.425 [[2]] # A tibble: 10 × 6 term estimate std.error statistic p.value `` <chr> <dbl> <dbl> <dbl> <dbl> <dbl> 1 (Intercept) 6.54 10.2 0.643 4.01 1.63 2 disp 0.0119 0.0140 0.849 0.0134 0.887 3 hp -0.0149 0.0165 -0.901 0.0104 -1.43 4 drat 2.36 1.18 2.01 0.859 2.75 5 wt -3.01 1.35 -2.23 1.35 -2.23 6 qsec 0.577 0.524 1.10 0.193 2.99 7 vs 0.000922 1.58 0.000582 1.07 0.000860 8 am 1.35 1.54 0.880 0.886 1.53 9 gear 1.00 1.33 0.752 0.527 1.90 10 carb -0.355 0.611 -0.582 0.378 -0.940 [[3]] # A tibble: 11 × 5 term estimate std.error statistic p.value <chr> <dbl> <dbl> <dbl> <dbl> 1 (Intercept) -3.70 18.9 -0.195 0.848 2 cyl 0.668 1.03 0.647 0.529 3 disp 0.00831 0.0153 0.544 0.596 4 hp -0.0124 0.0173 -0.717 0.486 5 drat 2.87 1.44 2.00 0.0674 6 wt -2.87 1.40 -2.05 0.0609 7 qsec 0.777 0.618 1.26 0.231 8 vs 0.169 1.64 0.103 0.920 9 am 1.90 1.79 1.07 0.306 10 gear 1.31 1.44 0.907 0.381 11 carb -0.601 0.730 -0.823 0.425
The recipe object
Let’s take a look at the model spec, the workflow and the fitted workflow to see what has happened behind the scenes. First let’s look at the formula in the recipe object.
formula(prep(rec_obj))
mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb <environment: 0x0000012c521f0c98>
So we see what the formula is, but we also see that there is no id_var
listed which means it would fail for the gee
specification, but, our model worked! That is because behind the scenes in the fast sense at this point in development the first variable is taken to be the id variable. Let’s see what the workflow looks like.
gee_wflw <- frt_tbl |> slice(2) |> pull(wflw) gee_wflw[[1]]$fit$actions$model$formula
mpg ~ id_var(cyl) + disp + hp + drat + wt + qsec + vs + am + gear + carb <environment: 0x0000012c5206d0f8>
As we can see the first predictor variable has been chosen as the id variable. This will change in the future, where you will be able to choose the id_var
.
Conclusion
Thank you for reading and I hope you enjoy the new version of tidyAML
. Please let me know if you have any questions or comments. We will continue our development of tidyAML
and hope to have more updates soon.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.