Regression with Interaction Terms – How Centering Predictors influences Main Effects

[This article was first published on Jonas Haslbeck - r, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Centering predictors in a regression model with only main effects has no influence on the main effects. In contrast, in a regression model including interaction terms centering predictors does have an influence on the main effects. After getting confused by this, I read this nice paper by Afshartous & Preston (2011) on the topic and played around with the examples in R. I summarize the resulting notes and code snippets in this blogpost.

We give an explanation on two levels:

  1. By illustrating the issue with the simplest possible example
  2. By showing in general how main effects are a function of the constants (e.g. means) that are substracted from predictor variables

Explanation 1: Simplest example

The simplest possible example to illustrate the issue is a regression model in which variable Y is a linear function of variables X1, X2, and their product X1X2

Y=β0+β1X1+β2X2+β3X1X2+ϵ,

where we set β0=1,β1=0.3,β2=0.2,β3=0.2, and ϵN(0,σ2) is Gaussian distribution with mean zero and variance σ2. We define the predictors X1,X2 as Gaussians with means μX1=μX2=1 and σX12=σX22=1. This code samples n=10000 observations from this model:

n <- 10000
b0 <- 1; b1 <- .3; b2 <- .2; b3 <- .2
set.seed(1)
x1 <- rnorm(n, mean = 1, sd = 1)
x2 <- rnorm(n, mean = 1, sd = 1)
y <- b0 + b1 * x1 + b2 * x2 + b3 * x1 * x2 + rnorm(n, mean = 0, sd = 1)

Regression models with main effects

We first verify that centering variables indeed does not affect the main effects. To do so, we first fit the linear regression with only main effects with uncentered predictors

lm(y ~ x1 + x2)

Call:
  lm(formula = y ~ x1 + x2)

Coefficients:
  (Intercept)           x1           x2  
0.8088       0.4983       0.4015  

and then with mean centered predictors

x1_c <- x1 - mean(x1) # center predictors
x2_c <- x2 - mean(x2)
lm(y ~ x1_c + x2_c)

Call:
  lm(formula = y ~ x1_c + x2_c)

Coefficients:
  (Intercept)         x1_c         x2_c  
1.7036       0.4983       0.4015  

The parameter estimates of the regression with uncentered predictors are β^10.50 and β^20.40. The estimates of the regression with centered predictors are β^10.50 and β^20.40 (we denote estimates from regressions with centered predictors with an asterisk). And indeed, β^1=β^1 and β^2=β^2.

Regression models with main effects + interaction

We include the interaction term and show that centering the predictors now does does affect the main effects. We first fit the regression model without centering

lm(y ~ x1 * x2)

Call:
  lm(formula = y ~ x1 * x2)

Coefficients:
  (Intercept)           x1           x2        x1:x2  
1.0183       0.2883       0.1898       0.2111  

and then with centering

lm(y ~ x1_c * x2_c)

Call:
  lm(formula = y ~ x1_c * x2_c)

Coefficients:
  (Intercept)         x1_c         x2_c    x1_c:x2_c  
1.7026       0.4984       0.3995       0.2111  

We see that β^10.29 and β^20.19 and β^10.50 and β^20.40. While the two models have different parameters, they are statistically equivalent. Here this means that expected values of both models are the same. In empirical terms this means that their coefficient of determination R2 is the same. The reader will be able to verify this in Explanation 2 below.

We make two observations:

  1. In the model with interaction terms, the main effects differ between the regressions with/without centering of predictors
  2. When centering predictors, the main effects are the same in the model with/without the interaction term (up to some numerical inaccuracy)

Why does centering influence main effects in the presence of an interaction term?

The reason is that in the model with the interaction term, the parameter β1 (uncentered predictors) is the main effect of X1 on Y if X2=0, and the parameter β1 (centered predictors) is the main effect of X1 on Y if X2=μX2. This means that β1 and β1 are modeling different effects in the data. Here is a more detailed explanation:

Rewriting the model equation in the following way

% <![CDATA[\begin{aligned}\mathbb{E}[Y] &#038;= \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1X_2 \\&#038;= \beta_0 + (\beta_1 + \beta_3 X_2) X_1 + \beta_2 X_2\end{aligned} %]]&gt;

shows that in the model with interaction term, the effect of X1 on Y is equal to (β1+β3X2) and therefore a function of X2. What does the parameter β1 model here? It models the effect of X1 on Y when X2=0. Similarly we could rewrite the effect of X1 on Y as a function of X2.

Now let X1c=X1μX1 and X2c=X2μX2 be the centered predictors. We get the same model equations, now with the parameters estimated using the centered predictors X1c,X2c:

% <![CDATA[\begin{aligned}\mathbb{E}[Y] &#038;= \beta_0^* + \beta_1^* X_1^c + \beta_2^* X_2^c + \beta_3^* X_1^c X_2^c \\&#038;= \beta_0^* + (\beta_1^* + \beta_3^*  X_2^c) X_1^c + \beta_2^*  X_2^c \\\end{aligned} %]]&gt;

Again we focus on the effect (β1+β3X2c) of X1c on Y. What does the the parameter β1 model here? It models the main effect of X1c on Y when X2c=μX2c=0. What remained the same is that β1 is the main effect of X1c on Y when X2c=0. But what is new is that μX2c=0.

To summarize, in the uncentered case βi is the main effect when the predictor variable Xi is equal to zero; and in the centered case, βi is the main effect when the predictor variable Xi is equal to its mean. Clearly, βi and βi model different effects in the data and it is therefore not surprising that the two regressions give us very different estimates.

Centering interpretation of β remains the same when adding interaction

Our second observation above was that the estimates of main effects are the same with/without interaction term when centering the predictor variables. This is because in the models without interaction term (centered or uncentered predictors) the interpretation of β1 is the same as in the model with interaction term and centered predictors.

More precisely, in the regression model with only main effects, β1 is the main effect of X1 on Y averaged over all values of X2, which is the same as the main effect of X1 on Y for X2=μX2. This means that if we center predictors, β1 models the same effect in the data in a model with/without interaction term. This is an attractive property to have when one is interested in comparing models with/without interaction term.

Explanation 2: Main effects as functions of added constants

Substracting the mean from predictors is a special case of adding constants to predictors. Here we first show numerically what happens to each regression parameter when adding constants to predictors. Then we show analytically how each parameter is a function of its value in the original regression model (no constant added) and the added constants.

Why are we doing this? We are doing this to develop a more general understanding of what happens when adding constants to predictors. It also puts the above example in a more general context, since we can consider it as a special case of the following analysis.

Numerical experiment I: Only main effects

We first fit a series of regression models with only main effects. In each of them we add a different constant to the predictors. We do this verify that our claim that centering predictors does not change main effects extends to the more general situation of adding constants to predictors.

We first define a sequence of constant values we add to the predictors and create storage for parameter estimates:

n <- 25
c_sequence <- seq(-1.5, 1.5, length = n)

A <- as.data.frame(matrix(NA, ncol=5, nrow=n))
colnames(A) <- c("b0", "b1", "b2", "b3", "R2")

We now fit 25 regression models, and in each of them we add a constant c to both predictors, taken from the sequence c_sequence:

for(i in 1:25) {

c <- c_sequence[i]
x1_c <- x1 + c
x2_c <- x2 + c

lm_obj <- lm(y ~ x1_c + x2_c) # Fit model
A$b0[i] <- lm_obj$coefficients[1]
A$b1[i] <- lm_obj$coefficients[2]
A$b2[i] <- lm_obj$coefficients[3]

yhat <- predict(lm_obj)
A$R2[i] <- 1 - var(yhat - y) / var(y) # Compute R2

}

Remark: in Explanation 1 we said that the coefficient of determination R2 does not change when adding constants to the predictors. We invite the reader to verify this by inspecting A$R2.

We plot all parameters β0,β1,β2 as a function of c:

library(RColorBrewer)
cols <- brewer.pal(4, "Set1") # Select nice colors

plot.new()
plot.window(xlim=range(c_sequence), ylim=c(-.5, 2.5))
axis(1, round(c_sequence, 2), cex.axis=0.75, las=2)
axis(2, c(-.5, 0, .5, 1, 1.5, 2, 2.5), las=2)
lines(c_sequence, A$b0, col = cols[1])
lines(c_sequence, A$b1, col = cols[2])
lines(c_sequence, A$b2, col = cols[3])
legend("topright", c("b0", "b1", "b2"), 
col = cols[1:3], lty = rep(1,3), bty = "n")
title(xlab = "Added constant")
title(ylab = "Parameter value")

center

We see that the intercept changes as a function of c. The model at c = 0 corresponds to the very first model we fitted above. And the model at c = -1 corresponds to the model fitted with centered predictors. But the key observation is that the main effects β1,β2 do not change. A proof of this and an exact expression for the intercept will fall out of our analysis of the model with interaction term in the last section of this blogpost.

Numerical experiment II: main effects + interaction term

Next we show that this is different when adding the interaction term. We use the same sequence of c as above and fit regression models with interaction term:

for(i in 1:25) {

c <- c_sequence[i]
x1_c <- x1 + c
x2_c <- x2 + c

lm_obj <- lm(y ~ x1_c * x2_c) # Fit model
A$b0[i] <- lm_obj$coefficients[1]
A$b1[i] <- lm_obj$coefficients[2]
A$b2[i] <- lm_obj$coefficients[3]
A$b3[i] <- lm_obj$coefficients[4]

yhat <- predict(lm_obj, data = c(y, x1_c, x2_c))
A$R2[i] <- 1 - var(yhat - y) / var(y) # Compute R2

}

And again we plot all parameters β0,β1,β2,β3 as a function of c:

plot.new()
plot.window(xlim=range(c_sequence), ylim=c(-.5, 2.5))
axis(1, round(c_sequence, 2), cex.axis=0.75, las=2)
axis(2, c(-.5, 0, .5, 1, 1.5, 2, 2.5), las=2)
lines(c_sequence, A$b0, col = cols[1])
lines(c_sequence, A$b1, col = cols[2])
lines(c_sequence, A$b2, col = cols[3])
lines(c_sequence, A$b3, col = cols[4])
legend("topright", c("b0", "b1", "b2", "b3"), 
col = cols[1:4], lty = rep(1,3), bty = "n")
title(xlab = "Added constant")
title(ylab = "Parameter value")

center

This time both the intercept β0 and the main effects β1,β2 are a function of c, while the interaction effect β3 is constant. At this point the best explanation is simply to go through the algebra, which explains these results exactly. We do this in the next section.

Deriving function for all effects

We plug in the definition of centering in the population regression model we introduced at the very beginning of this blogpost. This gives us every parameter as a function of two things: (1) the parameters in the original model and (b) the added constant. Above we added the same constant to both predictors. Here we consider the general case where the constants can differ.

Our original (unaltered) model is given by:

% <![CDATA[\begin{aligned}\mathbb{E}[Y] &#038;= \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1X_2\end{aligned} %]]&gt;

Now we plug in the predictors with added constants c1,c2, multiply out, and rearrange:

% <![CDATA[\begin{aligned}\mathbb{E}[Y] &#038;= \beta_0^* + \beta_1^* (X_1 + c_1) + \beta_2 (X_2 + c_2) + \beta_3^* (X_1 + c_1) (X_2 + c_2) \\&#038; = \beta_0^* + \beta_1^*X_1 + \beta_1^*c_1 + \beta_2^*X_2 + \beta_2^*c_2+ \beta_3^* X_1X_2 + \beta_3^*X_1 c_2 + \beta_3^* c_1X_2 + \beta_3^* c_1c_2 \\&#038;= (\beta_0^* + \beta_1^*c_1 + \beta_2^*c_2 + \beta_3^* c_1c_2) + (\beta_1^* + \beta_3^*c_2)X_1 + (\beta_2^* + \beta_3^*c_1)X_2 + \beta_3^* X_1X_2\end{aligned} %]]&gt;

Now if we equate the respective interecept and slope terms we get:

β0=β0+β1c1+β2c2+β3c1c2 β1=β1+β3c2 β2=β2+β3c1

and

β3=β3

Now we solve for the parameters β0,β1,β2,β3 from the models with constants added to the predictors.

Because we know β3=β3 we can write β2=β2+β3c1 and can solve

β2=β2β3c1

The same goes for β1 so we have

β1=β1β3c2

Finally, to obtain a formula for β0 we plug the just obtained expressions for β1, β2 and β3 into

β0=β0+β1c1+β2c2+β3c1c2

and get

% <![CDATA[\begin{aligned}\beta_0 &#038;= \beta_0^* + (\beta_1 - \beta_3 c_2)c_1 +  (\beta_2 - \beta_3 c_1)c_2 + \beta_3 c_1c_2 \\&#038;= \beta_0^* + \beta_1 c_1 - \beta_3 c_2 c_1 + \beta_2 c_2 - \beta_3 c_2 c_1 + \beta_3 c_1c_2 \\&#038;=  \beta_0^* + \beta_1 c_1 + \beta_2 c_2 - \beta_3 c_1c_2\end{aligned} %]]&gt;

and can solve for β0:

β0=β0β1c1β2c2+β3c1c2

Let’s check whether those fomulas predict the parameter changes as a function of c in the numerical experiment above.

lm_obj <- lm(y ~ x1 * x2) # Reference model (no constant added)
b0 <- lm_obj$coefficients[1]
b1 <- lm_obj$coefficients[2]
b2 <- lm_obj$coefficients[3]
b3 <- lm_obj$coefficients[4]

B <- A # Storage for predicted parameters

for(i in 1:25) {
  
  c <- c_sequence[i]
  
  B$b0[i] <- b0 - b1*c - b2*c + b3*c*c
  B$b1[i] <- b1 - b3*c
  B$b2[i] <- b2 - b3*c
  B$b3[i] <- b3
  
}

We plot the computed parameters by the derived expressions as points on the empirical results from the numerical experiments above

plot.new()
plot.window(xlim=range(c_sequence), ylim=c(-.5, 2.5))
axis(1, round(c_sequence, 2), cex.axis=0.75, las=2)
axis(2, c(-.5, 0, .5, 1, 1.5, 2, 2.5), las=2)
lines(c_sequence, A$b0, col = cols[1])
lines(c_sequence, A$b1, col = cols[2])
lines(c_sequence, A$b2, col = cols[3])
lines(c_sequence, A$b3, col = cols[4])
legend("topright", c("b0", "b1", "b2", "b3"), 
       col = cols[1:4], lty = rep(1,3), bty = "n")

# Plot predictions
points(c_sequence, B$b0, col = cols[1])
points(c_sequence, B$b1, col = cols[2])
points(c_sequence, B$b2, col = cols[3])
points(c_sequence, B$b3, col = cols[4])
title(xlab = "Added constant")
title(ylab = "Parameter value")

center

and they match the numerical results exactly.

We see that the derived expressions explain exactly how parameters change as a function of the parameters of the reference model and the added constants.

If we set β3=0, we get the same derivation for the regression model without interaction term. We find that β1=β1, β2=β2, and β0=β0β1c1β2c2.

To leave a comment for the author, please follow the link and comment on their blog: Jonas Haslbeck - r.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)