How to Perform Multiple Linear Regression in R

Steven P. Sanderson II, MPH

2 weeks ago

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< section id="introduction" class="level1">

Introduction

Multiple linear regression is a powerful statistical method that allows us to examine the relationship between a dependent variable and multiple independent variables.

< section id="example" class="level1">

Example

< section id="step-1-load-the-dataset" class="level2">

Step 1: Load the dataset

# Load the mtcars dataset
data(mtcars)

< section id="step-2-build-the-model" class="level2">

Step 2: Build the model

Now, let’s create the multiple linear regression model using the specified variables: disp, hp, and drat.

# Build the multiple linear regression model
model <- lm(mpg ~ disp + hp + drat, data = mtcars)

< section id="step-3-examine-the-data" class="level2">

Step 3: Examine the data

It’s always a good idea to take a look at the relationships between variables before diving into the model. The pairs() function helps us with that.

# Examine relationships between variables
pairs(mtcars[,c("mpg","disp","hp","drat")])

< section id="step-4-check-for-multicollinearity" class="level2">

Step 4: Check for multicollinearity

Multicollinearity is when independent variables in a regression model are highly correlated. It can affect the stability and reliability of our model. Keep an eye on the scatterplots in the pairs plot to get a sense of this.

< section id="step-5-plot-the-residuals" class="level2">

Step 5: Plot the residuals

Now, let’s check the model’s residuals using a scatterplot. Residuals are the differences between observed and predicted values. They should ideally show no pattern.

# Plot the residuals
plot(
  model$residuals, 
  main = "Residuals vs Fitted Values", 
  xlab = "Fitted Values", 
  ylab = "Residuals"
  )

< section id="step-6-evaluate-the-model" class="level2">

Step 6: Evaluate the model

By examining the residuals vs. fitted values plot, we can identify patterns that may suggest non-linearity or heteroscedasticity. Ideally, residuals should be randomly scattered.

< section id="step-7-encourage-readers-to-try-it-themselves" class="level2">

Step 7: Encourage readers to try it themselves

I’d encourage readers to take the code snippets, run them in their R environment, and explore. Maybe try different variables, tweak the model, or even use another dataset. Hands-on experience is the best teacher!

Remember, understanding the data and interpreting the results is as important as running the code. It’s a fascinating journey into uncovering patterns and relationships within your data.

Feel free to reach out if you have any questions or if there’s anything specific you’d like to explore further. Happy coding!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.