HM878: Helper Functions
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
This vignette demonstrates how to use the functions included in this package so far. If you have not yet, install the package with the following code: devtools::install_github("brendensm/hm878")
. If you do not have the package devtools, be sure to install that first install.packages("devtools")
.
To start, we load the package
library(hm878)
Let’s assume we are running a binomial logistic regression using the data from mtcars
, a built-in data set included with R. We will use vs
(engine type as V-shaped or straight) as the dependent variable, and cyl
(number of cylinders) as the independent variable. We will store our models for block 1 and block 2.
mb1 <- glm(vs ~ 1, data = mtcars, family = "binomial") mb2 <- glm(vs ~ cyl, data = mtcars, family = "binomial")
Testing Goodness of Fit with chi_log
To assess the fit of our models, we may want to use the function chi_log
. To use it, simply type in the name of your model as the first argument, followed by the data set that the model uses. Optionally, you can provide labels for each model using the third argument. Here I will label each block.
chi_log(mb1, mtcars, "Block 1")
Pearson Goodness of Fit Test Null Hypothesis: The model fits. Alternative Hypothesis: The model does not fit. Pearson chi-squared for Block 1: 32 Degrees of freedom for Block 1: 31 P-value for Block 1: 0.416744 --- Failed to reject Null Hypothesis. The model fits. ---
chi_log(mb2, mtcars, "Block 2")
Pearson Goodness of Fit Test Null Hypothesis: The model fits. Alternative Hypothesis: The model does not fit. Pearson chi-squared for Block 2: 27.42 Degrees of freedom for Block 2: 30 P-value for Block 2: 0.6013516 --- Failed to reject Null Hypothesis. The model fits. ---
The function gives us the chi-squared statistic, degrees of freedom, and a p-value. It also reminds us of the null and alternative hypotheses. Both models appear to be a good fit.
Accuracy Percentage with predict_percent
We may want to also check the accuracy of our models. To do this, we can use predict_percent
. To use this function, enter the name of the model in the first argument, followed by the dependent variable we used in the model. For this, we must use the data$variable format. In the example below, we use the variable vs
from the data set mtcars
. Once again, we can label the output with a string as the optional third argument.
predict_percent(mb1, mtcars$vs, "Block 1")
Accuracy for Block 1: 56.25%
predict_percent(mb2, mtcars$vs, "Block 2")
Accuracy for Block 2: 84.38%
Calculating Odds Ratios with or
To calculate odds ratios for the models, simply pass the model through the function or
.
or(mb1)
Odds_Ratio CI_Lower CI_Upper p_values (Intercept) 0.7777778 0.3801366 1.558936 0.4806496
or(mb2)
Odds_Ratio CI_Lower CI_Upper p_values (Intercept) 10873.447296 95.35600799 6.507716e+07 0.002692584 cyl 0.204474 0.04827075 4.455527e-01 0.001917098
The output results in a data frame with the odds ratios, confidence intervals, and p-values.
Upper and Lower Fences with fences
If you want to revise and adjust your model, it can be helpful to limit outliers. To find upper and lower fences quickly, use the function fences
. To do this, pass the continuous variable you are interested in examining through the function. Once again, use the format data$variable.
#fences(iris$Sepal.Length) #fences(mtcars$cyl)$Upper #fences(mtcars$cyl)$Lower
Comparing Model Results with compare_models
Lastly, when you are putting together multiple models, it can be helpful to view them all at the same time, next to one another. This is particularly helpful if you have more than two models you are comparing. For this function, pass through however many models you have to compare, and optionally label each one, using a vector of strings for each model. To demonstrate, I will add on another model mb3
that will have another continuous independent variable.
mb3 <- glm(vs ~ cyl + wt, data = mtcars, family = "binomial") compare_models(mb1, mb2, mb3, labels = c("Model 1 Block 1", "Model 1 Block 2", "Model Block 3"))
$`Model 1 Block 1` Call: glm(formula = vs ~ 1, family = "binomial", data = mtcars) Coefficients: (Intercept) -0.2513 Degrees of Freedom: 31 Total (i.e. Null); 31 Residual Null Deviance: 43.86 Residual Deviance: 43.86 AIC: 45.86 $`Model 1 Block 2` Call: glm(formula = vs ~ cyl, family = "binomial", data = mtcars) Coefficients: (Intercept) cyl 9.294 -1.587 Degrees of Freedom: 31 Total (i.e. Null); 30 Residual Null Deviance: 43.86 Residual Deviance: 17.96 AIC: 21.96 $`Model Block 3` Call: glm(formula = vs ~ cyl + wt, family = "binomial", data = mtcars) Coefficients: (Intercept) cyl wt 10.619 -2.931 2.100 Degrees of Freedom: 31 Total (i.e. Null); 29 Residual Null Deviance: 43.86 Residual Deviance: 15.55 AIC: 21.55
deviance_aic
Pull the Deviances and AICs from model summarys
deviance_aic(mb1, mb2, mb3)
mb1 Residual Deviance: 43.86 Null Deviance: 43.86 AIC: 45.86 mb2 Residual Deviance: 17.96 Null Deviance: 43.86 AIC: 21.96 mb3 Residual Deviance: 15.55 Null Deviance: 43.86 AIC: 21.55
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.