[This article was first published on rstats-tips.net, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
During the stream we had an example with one categorical and one numeric predictor
and we built a model with interaction between these two predictors.
So the result is a straight line for each value of the categorival group.
The question was if it makes a difference if you build a simple linear model for
each group.
So we get three clouds of points. Each cloud follows its own line.
1
2
3
4
data %>%
ggplot(aes(x, y, color = class_name)) +
geom_point() +
stat_smooth(formula = y ~ x, method = lm, se = FALSE)
Build one linear model
We’re building the model without an intercept by using 0 + as start of the
formula so we get the intercept for each group directly (without building the
sum of the reference group and the other groups.)
So $ y $ contains the values of the dependent variable. $ X $ is a matrix with
$ 1 $ in the first column called $ x_0 $. $ x_1 .. x_3 $ are the predictors.
$ x_3 $ is the interaction between $ x_1 $ and $ x_2 $.
The model
Now we can write our model as:
$$
y = X \beta + \epsilon
$$
We’re searching for $ b $ so that $ \hat{y} = X b $ is the
prediction which minimizes the sum of the squared residuals.
We get $ b $ as
$$
b = (X^t X)^{-1} X^t y
$$
So let’s compute it: (solve computes the inverse of a matrix)
1
2
3
4
Xt <- t(X)
b <- solve(Xt %*% X) %*% Xt %*% y
round(b, 3)
Now we’re looking at the standard errors of the coefficients:
We can compute them by
$$
Cov(b) = \sigma^2 (X^t X)^{-1}
$$
A predictior for $ \sigma^2 $ is the mean squared error:
$$
s^2 = \frac{\sum{(y_i - \hat{y})^2}}{df}
$$
In our case the degrees of freedom are $ df = n - 4 $.
1
2
3
4
yh <- X %*% b
df <- length(y) - 4
MSE <- sum((y - yh)**2) / df
MSE
1
## [1] 31.25
So the standard errors of the coefficients are the main diagonal of this matrix:
1
(solve(Xt %*% X) * MSE) %>% sqrt()
1
## Warning in sqrt(.): NaNs produced
1
2
3
4
5
## x0 x1 x2 x3
## x0 5.590170 NaN NaN 1.7677670
## x1 NaN 7.905694 1.7677670 NaN
## x2 NaN 1.767767 0.6846532 NaN
## x3 1.767767 NaN NaN 0.9682458
These are the same as computed by lm:
1
lm(y ~ class_name * x, data) %>% summary()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
##
## Call:
## lm(formula = y ~ class_name * x, data = data)
##
## Residuals:
## 1 2 3 4 5 6
## -4.441e-16 -5.000e+00 5.000e+00 1.648e-15 -2.500e+00 2.500e+00
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.0000 5.5902 0.179 0.875
## class_nameB -1.0000 7.9057 -0.126 0.911
## x 1.5000 0.6847 2.191 0.160
## class_nameB:x -1.2500 0.9682 -1.291 0.326
##
## Residual standard error: 5.59 on 2 degrees of freedom
## Multiple R-squared: 0.8201, Adjusted R-squared: 0.5501
## F-statistic: 3.038 on 3 and 2 DF, p-value: 0.2574
As you can see the covariance matrix isn’t diagnal. Points from “the other group”
also effect the standard error.
So the errors are different to the errors of two independent models:
1
2
3
data %>%
group_by(class_name) %>%
group_map(~ coef(summary(lm(y ~ x, data = .x))))
1
2
3
4
5
6
7
8
9
## [[1]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.0 7.0710678 0.1414214 0.9105615
## x 1.5 0.8660254 1.7320508 0.3333333
##
## [[2]]
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.56395e-16 3.5355339 7.251946e-17 1.0000000
## x 2.50000e-01 0.4330127 5.773503e-01 0.6666667
Separate Models
For completeness let’s check the manual way for one single model here: