The building of {tidyAML}
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Yesterday I posted on An Update to {tidyAML}
where I was discussing some of my thought process and how things could potentially work for the package.
Today I want to showcase how the function fast_regression_parsnip_spec_tbl()
and it’s complimentary function fast_classification_parsnip_spec_tbl()
actually work or maybe don’t work for that matter.
We are going to pick on fast_regression_parsnip_spec_tbl()
in today’s post. The point of it is that it creates a tibble
of parsnip
regression model specifications. This will create a tibble
of 46 different regression model specifications which can be filtered. The model specs are created first and then filtered out. This will only create models for regression problems. To find all of the supported models in this package you can visit the parsnip search page
Function
First let’s take a look at the function call itself.
fast_regression_parsnip_spec_tbl( .parsnip_fns = "all", .parsnip_eng = "all" )
Now let’s take a look at the arguments:
.parsnip_fns
– The default for this is set to all. This means that all of the parsnip linear regression functions will be used, for example linear_reg(), or cubist_rules. You can also choose to pass ac()
vector likec("linear_reg","cubist_rules")
.parsnip_eng
– The default for this is set to all. This means that all of the parsnip linear regression engines will be used, for examplelm
, orglm
. You can also choose to pass ac(
) vector likec('lm', 'glm')
The workhorse to this function is the internal_make_spec_tbl()
function. This is the one that will be the subject of the post. Let’s take a look at it’s inner workings, afterall this is open source.
internal_make_spec_tbl <- function(.data){ # Checks ---- df <- dplyr::as_tibble(.data) nms <- unique(names(df)) if (!".parsnip_engine" %in% nms | !".parsnip_mode" %in% nms | !".parsnip_fns" %in% nms){ rlang::abort( message = "The model tibble must come from the class/reg to parsnip function.", use_cli_format = TRUE ) } # Make tibble ---- mod_spec_tbl <- df %>% dplyr::mutate( model_spec = purrr::pmap( dplyr::cur_data(), ~ match.fun(..3)(mode = ..2, engine = ..1) ) ) %>% # add .model_id column dplyr::mutate(.model_id = dplyr::row_number()) %>% dplyr::select(.model_id, dplyr::everything()) # Return ---- return(mod_spec_tbl) }
Let’s examine this (and it is currently changing form in a github issue). Firstly, we are taking in a data.frame/tibble that has to have certain names in it (this is going to change and look for a class instead). Once this determination is TRUE
we then proceed to the meat and potatoes of it. The internal mod_spec_tbl
is made using mutate
, pmap
, cur_data
and match.fun
. What this does essentially is the following:
mutate
a column calledmodel_spec
- Use the
{purrr}
functionpmap
which maps over several columns in parallel to create the model spec. - Inside of the
pmap
we usecur_data()
to get the current line where we match the function usingmatch.fun
(which takes a character string of the function, this means the library needs to be loaded) we supply the column it is in and then we supply the arguments we want. - We give it a numeric model id
- We then ensure that the
.model_id
column is first.
Example
Let’s see it in action!
library(tidyAML) # Not yet available, you can install from GitHub though fast_regression_parsnip_spec_tbl()
# A tibble: 46 × 5 .model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec <int> <chr> <chr> <chr> <list> 1 1 lm regression linear_reg <spec[+]> 2 2 brulee regression linear_reg <spec[+]> 3 3 gee regression linear_reg <spec[+]> 4 4 glm regression linear_reg <spec[+]> 5 5 glmer regression linear_reg <spec[+]> 6 6 glmnet regression linear_reg <spec[+]> 7 7 gls regression linear_reg <spec[+]> 8 8 lme regression linear_reg <spec[+]> 9 9 lmer regression linear_reg <spec[+]> 10 10 stan regression linear_reg <spec[+]> # … with 36 more rows
So we see we get a nicely generated tibble
of output that matchs a model spec to the .model_id
and to the appropriate parsnip engine
and mode
We can also choose the models we may want by giving either arguments to the .parsnip_engine
parameter or .parsnip_fns
or both.
library(dplyr) fast_regression_parsnip_spec_tbl(.parsnip_fns = "linear_reg")
# A tibble: 11 × 5 .model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec <int> <chr> <chr> <chr> <list> 1 1 lm regression linear_reg <spec[+]> 2 2 brulee regression linear_reg <spec[+]> 3 3 gee regression linear_reg <spec[+]> 4 4 glm regression linear_reg <spec[+]> 5 5 glmer regression linear_reg <spec[+]> 6 6 glmnet regression linear_reg <spec[+]> 7 7 gls regression linear_reg <spec[+]> 8 8 lme regression linear_reg <spec[+]> 9 9 lmer regression linear_reg <spec[+]> 10 10 stan regression linear_reg <spec[+]> 11 11 stan_glmer regression linear_reg <spec[+]>
fast_regression_parsnip_spec_tbl(.parsnip_eng = c("lm","glm"))
# A tibble: 3 × 5 .model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec <int> <chr> <chr> <chr> <list> 1 1 lm regression linear_reg <spec[+]> 2 2 glm regression linear_reg <spec[+]> 3 3 glm regression poisson_reg <spec[+]>
fast_regression_parsnip_spec_tbl(.parsnip_eng = "glm") %>% pull(model_spec)
[[1]] Linear Regression Model Specification (regression) Computational engine: glm [[2]] Poisson Regression Model Specification (regression) Computational engine: glm
Voila!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.