Introduction to Linear Regression in R: Analyzing the mtcars Dataset with lm()
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
The lm()
function in R is used for fitting linear regression models. It stands for “linear model,” and it allows you to analyze the relationship between variables and make predictions based on the data.
Let’s dive into the parameters of the lm()
function:
formula
: This is the most important parameter, as it specifies the relationship between the variables. It follows a pattern:y ~ x1 + x2 + ...
, wherey
is the response variable, andx1
,x2
, etc., are the predictor variables. For example, in themtcars
dataset, we can use the formulampg ~ wt
to predict the miles per gallon (mpg
) based on the weight (wt
) of the cars.data
: This parameter refers to the dataset you want to use for the analysis. In our case, we’ll use themtcars
dataset that comes with R.
Now, let’s see some examples using the mtcars
dataset
Examples
Example 1: Simple Linear Regression
# Fit a linear regression model to predict mpg based on weight model <- lm(mpg ~ wt, data = mtcars) # Print the summary of the model summary(model)
Call: lm(formula = mpg ~ wt, data = mtcars) Residuals: Min 1Q Median 3Q Max -4.5432 -2.3647 -0.1252 1.4096 6.8727 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** wt -5.3445 0.5591 -9.559 1.29e-10 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.046 on 30 degrees of freedom Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
Example 2: Multiple Linear Regression
# Fit a linear regression model to predict mpg based on weight and horsepower model <- lm(mpg ~ wt + hp, data = mtcars) # Print the summary of the model summary(model)
Call: lm(formula = mpg ~ wt + hp, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.941 -1.600 -0.182 1.050 5.854 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.22727 1.59879 23.285 < 2e-16 *** wt -3.87783 0.63273 -6.129 1.12e-06 *** hp -0.03177 0.00903 -3.519 0.00145 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.593 on 29 degrees of freedom Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148 F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
Example 3: Include Interaction Term
# Fit a linear regression model to predict mpg based on weight, horsepower, and their interaction model <- lm(mpg ~ wt + hp + wt:hp, data = mtcars) # Print the summary of the model summary(model)
Call: lm(formula = mpg ~ wt + hp + wt:hp, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.0632 -1.6491 -0.7362 1.4211 4.5513 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 49.80842 3.60516 13.816 5.01e-14 *** wt -8.21662 1.26971 -6.471 5.20e-07 *** hp -0.12010 0.02470 -4.863 4.04e-05 *** wt:hp 0.02785 0.00742 3.753 0.000811 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.153 on 28 degrees of freedom Multiple R-squared: 0.8848, Adjusted R-squared: 0.8724 F-statistic: 71.66 on 3 and 28 DF, p-value: 2.981e-13
These examples demonstrate how to use the lm()
function with different sets of predictor variables. After fitting the model, you can use the summary()
function to get detailed information about the regression results, including coefficients, p-values, and R-squared values.
I encourage you to try running these examples and explore different variables in the mtcars
dataset. Feel free to modify the formulas and experiment with additional parameters to deepen your understanding of linear regression modeling in R!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.