Introducing olsrr
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I am pleased to announce the olsrr package, a set of tools for improved output from linear regression models, designed keeping in mind beginner/intermediate R users. The package includes:
- comprehensive regression output
- variable selection procedures
- heteroskedasticiy, collinearity diagnostics and measures of influence
- various plots and underlying data
If you know how to build models using lm()
, you will find olsrr very
useful. Most of the functions use an object of class lm
as input. So you
just need to build a model using lm()
and then pass it onto the functions in
olsrr. Once you have picked up enough knowledge of R, you can move on to
more intuitive approach offered by tidymodels etc. as they offer more
flexibility, which olsrr does not.
Installation
# Install release version from CRAN install.packages("olsrr") # Install development version from GitHub # install.packages("devtools") devtools::install_github("rsquaredacademy/olsrr")
Shiny App
olsrr includes a shiny app which can be launched using
ols_launch_app()
or try the live version here.
Read on to learn more about the features of olsrr, or see the olsrr website for detailed documentation on using the package.
Regression Output
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars) ols_regress(model) ## Model Summary ## -------------------------------------------------------------- ## R 0.914 RMSE 2.622 ## R-Squared 0.835 Coef. Var 13.051 ## Adj. R-Squared 0.811 MSE 6.875 ## Pred R-Squared 0.771 MAE 1.858 ## -------------------------------------------------------------- ## RMSE: Root Mean Square Error ## MSE: Mean Square Error ## MAE: Mean Absolute Error ## ## ANOVA ## -------------------------------------------------------------------- ## Sum of ## Squares DF Mean Square F Sig. ## -------------------------------------------------------------------- ## Regression 940.412 4 235.103 34.195 0.0000 ## Residual 185.635 27 6.875 ## Total 1126.047 31 ## -------------------------------------------------------------------- ## ## Parameter Estimates ## ---------------------------------------------------------------------------------------- ## model Beta Std. Error Std. Beta t Sig lower upper ## ---------------------------------------------------------------------------------------- ## (Intercept) 27.330 8.639 3.164 0.004 9.604 45.055 ## disp 0.003 0.011 0.055 0.248 0.806 -0.019 0.025 ## hp -0.019 0.016 -0.212 -1.196 0.242 -0.051 0.013 ## wt -4.609 1.266 -0.748 -3.641 0.001 -7.206 -2.012 ## qsec 0.544 0.466 0.161 1.166 0.254 -0.413 1.501 ## ----------------------------------------------------------------------------------------
In the presence of interaction terms in the model, the predictors are scaled
and centered before computing the standardized betas. ols_regress()
will
detect interaction terms automatically but in case you have created a new
variable instead of using the inline function, you can indicate the presence
of interaction terms by setting iterm
to TRUE
.
Residual Diagnostics
olsrr offers tools for detecting violation of standard regression assumptions:
- Residual QQ plot
- Residual normality test
- Residual vs Fitted plot
- Residual histogram
ols_plot_resid_qq(model)
See Residual Diagnostics for more details.
Heteroskedasticity
olsrr provides the following 4 tests for detecting heteroscedasticity:
- Bartlett Test
- Breusch Pagan Test
- Score Test
- F Test
ols_test_breusch_pagan(model) ## ## Breusch Pagan Test for Heteroskedasticity ## ----------------------------------------- ## Ho: the variance is constant ## Ha: the variance is not constant ## ## Data ## ------------------------------- ## Response : mpg ## Variables: fitted values of mpg ## ## Test Summary ## ---------------------------- ## DF = 1 ## Chi2 = 0.5884673 ## Prob > Chi2 = 0.4430124
See Heteroskedasticity for more details.
Collinearity Diagnostics
VIF, Tolerance and condition indices to detect collinearity and plots for assessing mode fit and contributions of variables.
ols_coll_diag(model) ## Tolerance and Variance Inflation Factor ## --------------------------------------- ## # A tibble: 4 x 3 ## Variables Tolerance VIF ## <chr> <dbl> <dbl> ## 1 disp 0.125 7.99 ## 2 hp 0.194 5.17 ## 3 wt 0.145 6.92 ## 4 qsec 0.319 3.13 ## ## ## Eigenvalue and Condition Index ## ------------------------------ ## Eigenvalue Condition Index intercept disp hp ## 1 4.721487187 1.000000 0.000123237 0.001132468 0.001413094 ## 2 0.216562203 4.669260 0.002617424 0.036811051 0.027751289 ## 3 0.050416837 9.677242 0.001656551 0.120881424 0.392366164 ## 4 0.010104757 21.616057 0.025805998 0.777260487 0.059594623 ## 5 0.001429017 57.480524 0.969796790 0.063914571 0.518874831 ## wt qsec ## 1 0.0005253393 0.0001277169 ## 2 0.0002096014 0.0046789491 ## 3 0.0377028008 0.0001952599 ## 4 0.7017528428 0.0024577686 ## 5 0.2598094157 0.9925403056
See Collinearity Diagnostics for more details.
Measures of Influence
olsrr offers the following tools to detect influential observations:
- Cook’s D Bar Plot
- Cook’s D Chart
- DFBETAs Panel
- DFFITs Plot
- Studentized Residual Plot
- Standardized Residual Chart
- Studentized Residuals vs Leverage Plot
- Deleted Studentized Residual vs Fitted Values Plot
- Hadi Plot
- Potential Residual Plot
ols_plot_resid_lev(model)
See Measures of Influence for more details.
Variable Selection
Different variable selection procedures such as all possible regression, best subset regression, stepwise regression, stepwise forward regression and stepwise backward regression.
model <- lm(y ~ ., data = stepdata) ols_step_both_aic(model) ## Stepwise Selection Method ## ------------------------- ## ## Candidate Terms: ## ## 1 . x1 ## 2 . x2 ## 3 . x3 ## 4 . x4 ## 5 . x5 ## 6 . x6 ## ## ## Variables Entered/Removed: ## ## - x6 added ## - x1 added ## - x3 added ## - x2 added ## - x6 removed ## - x4 added ## ## No more variables to be added or removed. ## ## ## Stepwise Summary ## ---------------------------------------------------------------------------------- ## Variable Method AIC RSS Sum Sq R-Sq Adj. R-Sq ## ---------------------------------------------------------------------------------- ## x6 addition 33473.297 6241.497 13986.736 0.69145 0.69143 ## x1 addition 32931.758 6074.156 14154.076 0.69972 0.69969 ## x3 addition 31912.722 5771.842 14456.391 0.71466 0.71462 ## x2 addition 29304.296 5065.587 15162.646 0.74958 0.74953 ## x6 removal 29302.317 5065.592 15162.641 0.74958 0.74954 ## x4 addition 29300.814 5064.705 15163.528 0.74962 0.74957 ## ----------------------------------------------------------------------------------
See Variable Selection for more details.
Learning More
The olsrr website includes comprehensive documentation on using the package, including the following articles that cover various aspects of using olsrr:
Variable Selection - Different variable selection procedures such as all possible regression, best subset regression, stepwise regression, stepwise forward regression and stepwise backward regression.
Residual Diagnostics - Includes plots to examine residuals to validate OLS assumptions.
Heteroskedasticity - Tests for heteroskedasticity include bartlett test, breusch pagan test, score test and f test.
Collinearity Diagnostics - VIF, Tolerance and condition indices to detect collinearity and plots for assessing mode fit and contributions of variables.
Measures of Influence - Includes 10 different plots to detect and identify influential observations.
Feedback
olsrr has been on CRAN for more than an year while we were fixing bugs and making the API stable. All feedback is welcome. Issues (bugs and feature requests) can be posted to github tracker. For help with code or other related questions, feel free to reach me [email protected].
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.