Pipe-friendly bootstrapping with list-variables in #rstats
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A few days ago, my package sjstats was updated on CRAN. Most functions of this package are convenient functions for common statistical computations, especially for (mixed) regression models. This latest update introduces some pipe-friendly bootstrapping-methods, namely bootstrap()
, boot_ci()
, boot_se()
and boot_p()
. In this post, I just wanted to give a quick example of these functions, used within a pipeline-workflow.
First, load the required libraries:
library(dplyr) library(sjstats)
Now, init the sample data and fit a regular model. The model estimates how the dependency (e42dep) of an older person is related to the burden of care (neg_c_7) of a person who provides care to the frail older people:
data(efc) fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc)
I demonstrate the boot_ci()
-function, so the confidence intervals are of interest here:
confint(fit) > 2.5 % 97.5 % > (Intercept) 5.3374378 7.7794162 > e42dep 1.2929451 1.7964296 > c161sex -0.1193198 0.9871336
Now let’s see to obtain bootstrapped confidence intervals for this model. First, the bootstrap()
-function generates bootstrap replicates and returns a data-frame with just one column, $strap
, which is a list-variable with bootstrap samples:
bootstrap(efc, 1000)
This is how the list-variable looks like:
# A tibble: 1,000 x 1 strap <list> 1 <data.frame [908 x 26]> 2 <data.frame [908 x 26]> 3 <data.frame [908 x 26]> 4 <data.frame [908 x 26]> 5 <data.frame [908 x 26]> 6 <data.frame [908 x 26]> 7 <data.frame [908 x 26]> 8 <data.frame [908 x 26]> 9 <data.frame [908 x 26]> 10 <data.frame [908 x 26]> # ... with 990 more rows
Since all data frames are saved in a list, you can use lapply()
to easily run the same linear model (used above) over all bootstrap samples and save these fitted model objects in another list-variable (named models in the example below). Then, using lapply()
again, we can extract the coefficient of interest (here, the second coefficient, which is the estimated e42dep) for each „bootstrap“ model and save these coefficients in another variable (named dependency in the example below). Finally, we use the boot_ci()
-function to calculate confidence intervals of the bootstrapped coefficients.
The complete code looks like this:
efc %>% # generate bootstrape replicates, saved in # the list-variable 'strap' bootstrap(1000) %>% # run linear model on all bootstrap samples mutate(models = lapply(.$strap, function(x) { lm(neg_c_7 ~ e42dep + c161sex, data = x) })) %>% # extract coefficient for "e42dep" (dependency) variable mutate(dependency = unlist(lapply(.$models, function(x) coef(x)[2]))) %>% # compute boostrapped confidence intervals boot_ci(dependency)
And the result (depending on your seed()
) is:
conf.low conf.high 1.303847 1.790724
The complete code, put together:
library(dplyr) library(sjstats) data(efc) fit <- lm(neg_c_7 ~ e42dep + c161sex, data = efc) confint(fit) efc %>% bootstrap(1000) %>% mutate(models = lapply(.$strap, function(x) { lm(neg_c_7 ~ e42dep + c161sex, data = x) })) %>% mutate(dependency = unlist(lapply(.$models, function(x) coef(x)[2]))) %>% boot_ci(dependency)
Tagged: bootstrap, pipe, R, rstats
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.