Multiple Regression (Part 2) – Diagnostics
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Multiple Regression is one of the most widely used methods in statistical modelling. However, despite its many benefits, it is oftentimes used without checking the underlying assumptions. This can lead to results which can be misleading or even completely wrong. Therefore, applying diagnostics to detect any strong violations of the assumptions is important. In the exercises below we cover some material on multiple regression diagnostics in R.
Answers to the exercises are available here.
If you obtain a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Multiple Regression (Part 1) can be found here.
We will be using the dataset state.x77
, which is part of the state
datasets available in R
. (Additional information about the dataset can be obtained by running help(state.x77)
.)
Exercise 1
a. Load the state
datasets.
b. Convert the state.x77
dataset to a dataframe.
c. Rename the Life Exp
variable to Life.Exp
, and HS Grad
to HS.Grad
. (This avoids problems with referring to these variables when specifying a model.)
d. Produce the correlation matrix.
e. Create a scatterplot matrix for the variables Life.Exp
, HS.Grad
, Murder
, and Frost
.
Exercise 2
a. Fit the model with Life.Exp
as dependent variable, and HS.Grad
and Murder
as predictors.
b. Obtain the residuals.
c. Obtain the fitted values.
Exercise 3
a. Create a residual plot (residuals vs. fitted values).
b. Create the same residual plot using the plot
command on the lm
object from Exercise 2.
Exercise 4
Create plots of the residuals vs. each of the predictor variables.
Exercise 5
a. Create a Normality plot.
b. Create the same plot using the plot
command on the lm
object from Exercise 2.
Exercise 6
a. Obtain the studentized residuals.
b. Does there appear to be any outliers?
Exercise 7
a. Obtain the leverage value for each observation and plot them.
b. Obtain the conventional threshold for leverage values. Are any observations influential?
Exercise 8
a. Obtain DFFITS values.
b. Obtain the conventional threshold. Are any observations influential?
c. Obtain DFBETAS values.
d. Obtain the conventional threshold. Are any observations influential?
Exercise 9
a. Obtain Cook’s distance values and plot them.
b. Obtain the same plot using the plot
command on the lm
object from Exercise 2.
c. Obtain the threshold value. Are any observations influential?
Exercise 10
Create the Influence Plot using a function from the car
package.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.