Centering in Moderation Analysis: A Guide

[This article was first published on Stat's What It's All About, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This blog post will cover what centering is, what a sum-to-zero contrast is, why and when you should use them, and how you can do them in R. Specifically, we will focus on their use and how they affect coefficient interpretation in regression tables and affect interpretation in ANOVA tables.

I will be using the following packages and versions:

library(datawizard) # 0.9.1
library(parameters) # 0.21.3 

We will be using the following toy data set:

Generate Toy Dataset
data <- data.frame(
  Y = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2), 
  G = factor(c("g2", "g2", "g1", "g2", "g3", "g2", "g3", "g1", "g1", "g2")), 
  X = c(160, 160, 108, 258, 360, 225, 360, 146.7, 140.8, 167.6),
  Q = c(3.9, 3.9, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92), 
  A = factor(c("a1", "a1", "a2", "a2", "a1", "a2", "a1", "a2", "a2", "a2"))
)

(This is really a re-labeled version of mtcars[1:10, c(1:3, 5, 8)].)

str(data)
#> 'data.frame':    10 obs. of  5 variables:
#>  $ Y: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2
#>  $ G: Factor w/ 3 levels "g1","g2","g3": 2 2 1 2 3 2 3 1 1 2
#>  $ X: num  160 160 108 258 360 ...
#>  $ Q: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92
#>  $ A: Factor w/ 2 levels "a1","a2": 1 1 2 2 1 2 1 2 2 2

Simple Regression

We will start with not centering anything.

Continuous Predictor

model1 <- lm(Y ~ X, 
             data = data)

Something that is always true - the (Intercept) represents the predicted values when all the predictors are set to 0.

model_parameters(model1)
#> Parameter   | Coefficient |       SE |         95% CI |  t(8) |      p
#> ----------------------------------------------------------------------
#> (Intercept) |       25.56 |     1.62 | [21.82, 29.31] | 15.74 | < .001
#> X           |       -0.02 | 7.20e-03 | [-0.04, -0.01] | -3.46 | 0.009

# Predict for X=0
predict(model1, newdata = data.frame(X = 0))
#>       1 
#> 25.5638

But say we wanted the intercept to represent the predicted value when - how would we do that? By centering!

So what is centering?

Don’t just be a zero…

Centering is the processes of assigning meaning to the value of 0 of some variable. We do this by:

  1. defining a meaningful value
  2. Subtracting that value from all values of the variable.

For example, I can center X so that 0 is 258.

rbind(
  X = data$X,
  "X - 258" = data$X - 258
)
#>         [,1] [,2] [,3] [,4] [,5] [,6] [,7]   [,8]   [,9] [,10]
#> X        160  160  108  258  360  225  360  146.7  140.8 167.6
#> X - 258  -98  -98 -150    0  102  -33  102 -111.3 -117.2 -90.4

We can see that by subtracting 258 from X, values that were above 258 are positive, values that were below 258 are negative, and the values that were 258 are now 0. Thus, for this new variable, when we speak of 0, we actually mean “258”.

model2 <- lm(Y ~ I(X - 258),
             data = data)

model_parameters(model2)
#> Parameter   | Coefficient |       SE |         95% CI |  t(8) |      p
#> ----------------------------------------------------------------------
#> (Intercept) |       19.14 |     0.71 | [17.50, 20.78] | 26.86 | < .001
#> X - 258     |       -0.02 | 7.20e-03 | [-0.04, -0.01] | -3.46 | 0.009

# Predict for X=258
predict(model1, newdata = data.frame(X = 258))
#>        1 
#> 19.14033

Typically, centering is done around the mean (mean-centering) - thus giving 0 the meaning of “mean”.

model3 <- lm(Y ~ I(X - mean(X)),
             data = data)

model_parameters(model3)
#> Parameter   | Coefficient |       SE |         95% CI |  t(8) |      p
#> ----------------------------------------------------------------------
#> (Intercept) |       20.37 |     0.62 | [18.95, 21.79] | 32.99 | < .001
#> X - mean(X) |       -0.02 | 7.20e-03 | [-0.04, -0.01] | -3.46 | 0.009

# Predict for X=mean(X)
predict(model1, newdata = data.frame(X = mean(data$X)))
#>     1 
#> 20.37

We can also do this with the scale() function, but since this function also standardizes (re-scales to and sd of 1), we need to set scale(scale = FALSE):

model4 <- lm(Y ~ scale(X, scale = FALSE),
             data = data)

model_parameters(model4)
#> Parameter   | Coefficient |       SE |         95% CI |  t(8) |      p
#> ----------------------------------------------------------------------
#> (Intercept) |       20.37 |     0.62 | [18.95, 21.79] | 32.99 | < .001
#> X           |       -0.02 | 7.20e-03 | [-0.04, -0.01] | -3.46 | 0.009

# Predict for X=mean(X)
predict(model1, newdata = data.frame(X = mean(data$X)))
#>     1 
#> 20.37

Alternatively, we can just use datawizard::center():1

model5 <- lm(Y ~ center(X),
             data = data)

model_parameters(model5)
#> Parameter   | Coefficient |       SE |         95% CI |  t(8) |      p
#> ----------------------------------------------------------------------
#> (Intercept) |       20.37 |     0.62 | [18.95, 21.79] | 32.99 | < .001
#> center(X)   |       -0.02 | 7.20e-03 | [-0.04, -0.01] | -3.46 | 0.009

# Predict for X=mean(X)
predict(model1, newdata = data.frame(X = mean(data$X)))
#>     1 
#> 20.37

Note that no matter how we centered X, its coefficient did not change - only the intercept changed!

Categorical Predictor

When adding categorical predictors (in R we call these “factors”), R converts them to a set of variables ( being the number of levels in the factor) that represent a set of contrasts between the levels of the variable. For our G factor that has 3 levels, we get two such variables: G[g2] and G[g3].

model6 <- lm(Y ~ G,
             data = data)

model_parameters(model6)
#> Parameter   | Coefficient |   SE |          95% CI |  t(7) |      p
#> -------------------------------------------------------------------
#> (Intercept) |       23.33 | 0.96 | [ 21.05, 25.61] | 24.21 | < .001
#> G [g2]      |       -3.19 | 1.22 | [ -6.08, -0.31] | -2.62 | 0.034 
#> G [g3]      |       -6.83 | 1.52 | [-10.44, -3.23] | -4.49 | 0.003

By default, R uses treatment contrasts (stats::contr.treatment()). These are built up by constructing each variable to be 0 for all but 1 level of the factor, with the 1st level being left out. We can see this by playing with stats::contr.treatment():

contr.treatment(levels(data$G))
#>    g2 g3
#> g1  0  0
#> g2  1  0
#> g3  0  1

The “variable” (called a dummy variable) of g2 is only 1 for the level of G=g2, the dummy variable of g3 is only 1 for the level of G=g3, while g1 is left out - called the “reference level”.

Recall that the (Intercept) always represents the predicted values when all the predictors are set to 0. In this case, setting both g2 and g3 to 0 is the same as representing G=g1. And indeed, that is what the intercept represents:

model_parameters(model6)
#> Parameter   | Coefficient |   SE |          95% CI |  t(7) |      p
#> -------------------------------------------------------------------
#> (Intercept) |       23.33 | 0.96 | [ 21.05, 25.61] | 24.21 | < .001
#> G [g2]      |       -3.19 | 1.22 | [ -6.08, -0.31] | -2.62 | 0.034 
#> G [g3]      |       -6.83 | 1.52 | [-10.44, -3.23] | -4.49 | 0.003

predict(model6, newdata = data.frame(G = "g1"))
#>        1 
#> 23.33333

Interpreting Dummy Variables

The other variables now represent the differences between each level and the reference level. It is hard to see, but we can use some linear algebra to help us here:

(mm1 <- cbind(
  `(Intercept)` = 1,
  contr.treatment(levels(data$G))
))
#>    (Intercept) g2 g3
#> g1           1  0  0
#> g2           1  1  0
#> g3           1  0  1

t(solve(mm1))
#>    (Intercept) g2 g3
#> g1           1 -1 -1
#> g2           0  1  0
#> g3           0  0  1

Each column now holds the linear contrast weights represented by that coefficient. We can confirm this:

model_parameters(model6)
#> Parameter   | Coefficient |   SE |          95% CI |  t(7) |      p
#> -------------------------------------------------------------------
#> (Intercept) |       23.33 | 0.96 | [ 21.05, 25.61] | 24.21 | < .001
#> G [g2]      |       -3.19 | 1.22 | [ -6.08, -0.31] | -2.62 | 0.034 
#> G [g3]      |       -6.83 | 1.52 | [-10.44, -3.23] | -4.49 | 0.003

(predicted_means <- predict(model6, 
                            newdata = data.frame(G = c("g1", "g2", "g3"))))
#>        1        2        3 
#> 23.33333 20.14000 16.50000
predicted_means[2] - predicted_means[1]
#>         2 
#> -3.193333
predicted_means[3] - predicted_means[1]
#>         3 
#> -6.833333

Sum-to-Zero Contrasts

But say we wanted the intercept to represent the predicted value that is the grand mean - averaged across all levels of G () - how would we do that? By using sum-to-zero contrasts!

Sum-to-zero contrasts, like treatment coding (a.k.a dummy coding) convert a factor to a set of contrasts between the levels of the variable. Unlike dummy coding, the weights of each contrast sum to 0.

There are many coding schemes that achieve this (Helmert contrasts, polynomial contrasts, …). The most popular one is effect coding (in R, confusingly built with stats::contr.sum()).

contr.sum(levels(data$G))
#>    [,1] [,2]
#> g1    1    0
#> g2    0    1
#> g3   -1   -1

We can see that each contrast (column) sums to 0.

Side note about changing factor coding

There are 3 ways of changing the coding scheme (contrast matrix) of a factor in R:

# 1. assign to the contrasts() function
contrasts(data$G) <- contr.sum

# 2. The C() function (BIG C!)
data$G <- C(data$G, contr = contr.sum)

# 3. In the model
lm(Y ~ G,
   data = data,
   contrasts = list(G = contr.sum))

These should all produce the same results.

I will be using the 3rd option here for verbosity.

Let’s see how this affects our coefficients:

model7 <- lm(Y ~ G,
             data = data,
             contrasts = list(G = contr.sum))

model_parameters(model7)
#> Parameter   | Coefficient |   SE |         95% CI |  t(7) |      p
#> ------------------------------------------------------------------
#> (Intercept) |       19.99 | 0.57 | [18.65, 21.33] | 35.35 | < .001
#> G [1]       |        3.34 | 0.79 | [ 1.47,  5.22] |  4.21 | 0.004 
#> G [2]       |        0.15 | 0.71 | [-1.53,  1.83] |  0.21 | 0.840

(predicted_means <- predict(model6, 
                            newdata = data.frame(G = c("g1", "g2", "g3"))))
#>        1        2        3 
#> 23.33333 20.14000 16.50000
mean(predicted_means)
#> [1] 19.99111

Success - the intercept is equal to the grand mean!

What about the other variables? It is hard to see, but we can again use some linear algebra to help us here:

(mm2 <- cbind(
  `(Intercept)` = 1,
  contr.sum(levels(data$G))
))
#>    (Intercept)      
#> g1           1  1  0
#> g2           1  0  1
#> g3           1 -1 -1

t(solve(mm2))
#>    (Intercept)                      
#> g1   0.3333333  0.6666667 -0.3333333
#> g2   0.3333333 -0.3333333  0.6666667
#> g3   0.3333333 -0.3333333 -0.3333333

Again, each column holds the linear contrast weights represented by that coefficient. It might not be obvious, but these are the deviations of each mean from the grand mean.

Again, we can confirm this:

model_parameters(model7)
#> Parameter   | Coefficient |   SE |         95% CI |  t(7) |      p
#> ------------------------------------------------------------------
#> (Intercept) |       19.99 | 0.57 | [18.65, 21.33] | 35.35 | < .001
#> G [1]       |        3.34 | 0.79 | [ 1.47,  5.22] |  4.21 | 0.004 
#> G [2]       |        0.15 | 0.71 | [-1.53,  1.83] |  0.21 | 0.840

(predicted_means <- predict(model6, 
                            newdata = data.frame(G = c("g1", "g2", "g3"))))
#>        1        2        3 
#> 23.33333 20.14000 16.50000
predicted_means[1] - mean(predicted_means)
#>        1 
#> 3.342222
predicted_means[2] - mean(predicted_means)
#>         2 
#> 0.1488889

I personally like the interpretability of dummy-coding better - as differences from the reference level. Thankfully, we can preserve that interpretability while also having the intercept represent the grand mean, using deviation contrasts - another type of sum-to-zero contrasts, available with datawizard::contr.deviation().

contr.deviation(levels(data$G))
#>            g2         g3
#> g1 -0.3333333 -0.3333333
#> g2  0.6666667 -0.3333333
#> g3 -0.3333333  0.6666667
model8 <- lm(Y ~ G,
             data = data,
             contrasts = list(G = contr.deviation))

model_parameters(model8)
#> Parameter   | Coefficient |   SE |          95% CI |  t(7) |      p
#> -------------------------------------------------------------------
#> (Intercept) |       19.99 | 0.57 | [ 18.65, 21.33] | 35.35 | < .001
#> G [g2]      |       -3.19 | 1.22 | [ -6.08, -0.31] | -2.62 | 0.034 
#> G [g3]      |       -6.83 | 1.52 | [-10.44, -3.23] | -4.49 | 0.003

(predicted_means <- predict(model6, 
                            newdata = data.frame(G = c("g1", "g2", "g3"))))
#>        1        2        3 
#> 23.33333 20.14000 16.50000
mean(predicted_means)
#> [1] 19.99111
predicted_means[2] - predicted_means[1]
#>         2 
#> -3.193333
predicted_means[3] - predicted_means[1]
#>         3 
#> -6.833333

In Moderation Analysis

All this is great, but who cares about the intercept?

Well, this all comes into play much more substantially when moderators are involved!

When we have a moderator, the coefficient of the focal predictor represents a conditional effect. Specifically the effect of that predictor when the moderator is 0!

For two continuous predictors ( the focal predictor and the moderator), we can write out the regression equation as:

We can take out as a common factor:

In other words

Within this regression equation, is the “intercept” - the predicted slope of (the conditional slope) when !

If we condition on the mean of the moderator, this is often called the main effect, or the average effect.2

Let’s see this in action.

Continuous Predictor, Continuous Moderator

model9 <- lm(Y ~ X * Q,
             data = data)

model_parameters(model9)
#> Parameter   | Coefficient |    SE |            95% CI |  t(6) |     p
#> ---------------------------------------------------------------------
#> (Intercept) |        3.13 | 54.24 | [-129.58, 135.85] |  0.06 | 0.956
#> X           |        0.10 |  0.27 | [  -0.57,   0.76] |  0.35 | 0.737
#> Q           |        6.57 | 15.60 | [ -31.59,  44.73] |  0.42 | 0.688
#> X × Q       |       -0.04 |  0.08 | [  -0.24,   0.16] | -0.45 | 0.670

X’s coefficient is the conditional slope (a.k.a., the simple slope) of X when Q is 0:

(predicted_means <- predict(model9, 
                            newdata = data.frame(X = c(0, 1), Q = 0)))
#>        1        2 
#> 3.133346 3.228691
predicted_means[2] - predicted_means[1]
#>          2 
#> 0.09534522

When we center Q, it is the conditional slope (a.k.a., the simple slope) of X at the mean of Q - also called the “main effect”:

model10 <- lm(Y ~ X * center(Q),
             data = data)

model_parameters(model10)
#> Parameter     | Coefficient |    SE |          95% CI |  t(6) |      p
#> ----------------------------------------------------------------------
#> (Intercept)   |       26.39 |  2.90 | [ 19.30, 33.48] |  9.11 | < .001
#> X             |       -0.03 |  0.02 | [ -0.08,  0.02] | -1.64 | 0.151 
#> center(Q)     |        6.57 | 15.60 | [-31.59, 44.73] |  0.42 | 0.688 
#> X × center(Q) |       -0.04 |  0.08 | [ -0.24,  0.16] | -0.45 | 0.670

(predicted_means <- predict(model9, 
                            newdata = data.frame(X = c(0, 1), 
                                                 Q = mean(data$Q))))
#>        1        2 
#> 26.38931 26.35590
predicted_means[2] - predicted_means[1]
#>           2 
#> -0.03341344

Note that the coefficient of Q is not affected - it is the conditional slope of Q when X is 0. If we also wanted the main effect of Q we would need to also center X.

Continuous Predictor, Categorical Moderator

We can observe this same idea when the moderator is a factor.

# default dummy coding
model11 <- lm(Y ~ Q * A,
              data = data)

model_parameters(model11)
#> Parameter   | Coefficient |    SE |          95% CI |  t(6) |     p
#> -------------------------------------------------------------------
#> (Intercept) |       -2.40 | 11.37 | [-30.23, 25.43] | -0.21 | 0.840
#> Q           |        5.97 |  3.20 | [ -1.85, 13.80] |  1.87 | 0.111
#> A [a2]      |       14.76 | 13.58 | [-18.47, 47.99] |  1.09 | 0.319
#> Q × A [a2]  |       -3.40 |  3.81 | [-12.74,  5.93] | -0.89 | 0.406

Q’s coefficient is the conditional slope (a.k.a., the simple slope) of Q when the dummy variables (in this case just one) of A are set to 0 - when A is a1:

(predicted_means <- predict(model11, 
                            newdata = data.frame(Q = c(0, 1), A = "a1")))
#>         1         2 
#> -2.400173  3.574452
predicted_means[2] - predicted_means[1]
#>        2 
#> 5.974625

We can again use sum-to-zero contrasts to get the grand-average slope of X - the “main effect”:

model12 <- lm(Y ~ Q * A,
              data = data,
              contrasts = list(A = contr.sum))

model_parameters(model12)
#> Parameter   | Coefficient |   SE |          95% CI |  t(6) |     p
#> ------------------------------------------------------------------
#> (Intercept) |        4.98 | 6.79 | [-11.63, 21.59] |  0.73 | 0.491
#> Q           |        4.27 | 1.91 | [ -0.39,  8.94] |  2.24 | 0.066
#> A [1]       |       -7.38 | 6.79 | [-23.99,  9.23] | -1.09 | 0.319
#> Q × A [1]   |        1.70 | 1.91 | [ -2.96,  6.37] |  0.89 | 0.406

(predicted_means <- predict(model11,
                            newdata = expand.grid(Q = c(0, 1), 
                                                  A = c("a1", "a2"))))
#>         1         2         3         4 
#> -2.400173  3.574452 12.358596 14.929210
mean(c(predicted_means[2] - predicted_means[1], 
       predicted_means[4] - predicted_means[3]))
#> [1] 4.272619

Categorical Predictor, Continuous Moderator

This holds true when the focal predictor is a factor as well.

# default dummy coding
model13 <- lm(Y ~ A * Q,
              data = data)

model_parameters(model13)
#> Parameter   | Coefficient |    SE |          95% CI |  t(6) |     p
#> -------------------------------------------------------------------
#> (Intercept) |       -2.40 | 11.37 | [-30.23, 25.43] | -0.21 | 0.840
#> A [a2]      |       14.76 | 13.58 | [-18.47, 47.99] |  1.09 | 0.319
#> Q           |        5.97 |  3.20 | [ -1.85, 13.80] |  1.87 | 0.111
#> A [a2] × Q  |       -3.40 |  3.81 | [-12.74,  5.93] | -0.89 | 0.406

A’s coefficient (A [a2]) is the conditional difference (a.k.a., the simple effect) between the levels of A when Q is 0:

(predicted_means <- predict(model13, 
                            newdata = data.frame(A = c("a1", "a2"), Q = 0)))
#>         1         2 
#> -2.400173 12.358596
predicted_means[2] - predicted_means[1]
#>        2 
#> 14.75877

When we center Q, it is the conditional difference at the mean of Q - also called the “main effect”:

model14 <- lm(Y ~ A * center(Q),
              data = data)

model_parameters(model14)
#> Parameter          | Coefficient |   SE |          95% CI |  t(6) |      p
#> --------------------------------------------------------------------------
#> (Intercept)        |       18.74 | 1.15 | [ 15.92, 21.56] | 16.26 | < .001
#> A [a2]             |        2.72 | 1.49 | [ -0.93,  6.36] |  1.82 | 0.118 
#> center(Q)          |        5.97 | 3.20 | [ -1.85, 13.80] |  1.87 | 0.111 
#> A [a2] × center(Q) |       -3.40 | 3.81 | [-12.74,  5.93] | -0.89 | 0.406

(predicted_means <- predict(model13, 
                            newdata = data.frame(A = c("a1", "a2"), 
                                                 Q = mean(data$Q))))
#>        1        2 
#> 18.73805 21.45343
predicted_means[2] - predicted_means[1]
#>        2 
#> 2.715377

Note that in all cases, regardless of centering, the interaction coefficient was unchanged. This will always be true for the highest level interaction.

In ANOVA

Finally, we have Categorical-by-Categorical moderation - more often seen in ANOVA tables. As you might expect, when looking at coefficient tables, centering affects the values and meaning of the coefficient. When looking at ANOVA tables things get more complicated, as it depends on the type of sum-of-squares computed (1, 2, or 3).

You can read more about those differences in my blog post Everything You Always Wanted to Know About ANOVA.

The TL;DR is - yes, you should center your predictors / use sum-to-zero contrasts.

Summary

Centering makes coefficient tables (and ANOVA tables) more interpretable (and in Bayesian analysis can make prior elicitation easier) - a very important goal to itself, easing the interaction between the numbers and our squishy-human-flesh-brains - but that’s really all it does: it does not affect the model as a whole (doesn’t change or global multicollinearity).

It is quite common to see conditional slopes interpreted as main effects even when moderators have not been properly centered. Now at least you know better!

Session Info
sessionInfo()
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22621)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=English_Israel.utf8  LC_CTYPE=English_Israel.utf8   
#> [3] LC_MONETARY=English_Israel.utf8 LC_NUMERIC=C                   
#> [5] LC_TIME=English_Israel.utf8    
#> 
#> time zone: Asia/Jerusalem
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] parameters_0.21.3 datawizard_0.9.1 
#> 
#> loaded via a namespace (and not attached):
#>  [1] cli_3.6.2           knitr_1.45          TH.data_1.1-2      
#>  [4] rlang_1.1.2         xfun_0.41           estimability_1.4.1 
#>  [7] xtable_1.8-4        jsonlite_1.8.8      zoo_1.8-12         
#> [10] MSBMisc_0.0.1.14    htmltools_0.5.7     rmarkdown_2.25     
#> [13] grid_4.3.1          evaluate_0.23       MASS_7.3-60        
#> [16] fastmap_1.1.1       yaml_2.3.8          mvtnorm_1.2-4      
#> [19] insight_0.19.7.4    compiler_4.3.1      multcomp_1.4-25    
#> [22] codetools_0.2-19    sandwich_3.1-0      emmeans_1.9.0      
#> [25] coda_0.19-4         htmlwidgets_1.6.4   rstudioapi_0.15.0  
#> [28] lattice_0.21-8      digest_0.6.33       splines_4.3.1      
#> [31] Matrix_1.6-4        tools_4.3.1         survival_3.5-5     
#> [34] bayestestR_0.13.1.7

References

To leave a comment for the author, please follow the link and comment on their blog: Stat's What It's All About.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)