Custom contrasts in emmeans

[This article was first published on Very statisticious on Very statisticious, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Following up on a previous post, where I demonstrated the basic usage of package emmeans for doing post hoc comparisons, here I’ll demonstrate how to make custom comparisons (aka contrasts). These are comparisons that aren’t encompassed by the built-in functions in the package.

Remember that you can explore the available built-in emmeans functions for doing comparisons via ?"contrast-methods".

Reasons for custom comparisons

There are a variety of reasons you might need custom comparisons instead of some of the standard, built-in ones. One common scenario that I see a lot is when we have a single control group for multiple factors, so the factors aren’t perfectly crossed. This comes up, e.g., when doing experiments that involve applying different substances (like fertilizers) at varying rates. One factor is the different substances applied and the other is different application rates. However, the control is applying nothing or water or something like that. There aren’t different rates of the control to apply, so there is a single control group for both factors.

Rather than trying to fit a model with multiple factors, focusing on main effects and the interaction, such data can be analyzed with a simple effects model. This is where the two (or more) factors of interest have been combined into a single factor for analysis. Such an analysis focuses on the effect of the two factors combined. We can use post hoc comparisons to estimate the overall effects of individual factors.

R packages

I will load magrittr for the pipe in addition to emmeans.

library(emmeans) # v. 1.3.3
library(magrittr) # v. 1.5

The dataset and model

I’ve made a small dataset to use as an example. The response variable is resp and the two factors of interest have been combined into a single factor sub.rate that has 5 levels: A.1, A.2, B.1, B.2, and control.

One factor, which I’m thinking of as the substance factor, is represented by A and B (and the control). The second, the rate factor, is represented by 1 and 2.

dat = structure(list(sub.rate = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
5L, 5L), .Label = c("A.1", "A.2", "B.1", "B.2", "control"), class = "factor"), 
    resp = c(5.5, 4.9, 6.1, 3.6, 6.1, 3.5, 3, 4.1, 5, 4.6, 7.3, 
    5.6, 4.8, 7.2, 6.2, 4.3, 6.6, 6.5, 5.5, 7.1, 5.4, 6.7, 6.8, 
    8.5, 6.1)), row.names = c(NA, -25L), class = "data.frame")

str(dat)
# 'data.frame': 25 obs. of  2 variables:
#  $ sub.rate: Factor w/ 5 levels "A.1","A.2","B.1",..: 1 1 1 1 1 2 2 2 2 2 ...
#  $ resp    : num  5.5 4.9 6.1 3.6 6.1 3.5 3 4.1 5 4.6 ...

I will use a simple, linear model for analysis.

fit1 = lm(resp ~ sub.rate, data = dat)

Treatment vs control comparisons

The simple effects model makes it easy to get comparisons for each factor combination vs the control group with emmeans(). I’ll use trt.vs.ctrlk to do this since the control is the last level of the factor.

emmeans(fit1, specs = trt.vs.ctrlk ~ sub.rate)
# $emmeans
#  sub.rate emmean    SE df lower.CL upper.CL
#  A.1        5.24 0.466 20     4.27     6.21
#  A.2        4.04 0.466 20     3.07     5.01
#  B.1        6.22 0.466 20     5.25     7.19
#  B.2        6.00 0.466 20     5.03     6.97
#  control    6.70 0.466 20     5.73     7.67
# 
# Confidence level used: 0.95 
# 
# $contrasts
#  contrast      estimate   SE df t.ratio p.value
#  A.1 - control    -1.46 0.66 20 -2.214  0.1230 
#  A.2 - control    -2.66 0.66 20 -4.033  0.0024 
#  B.1 - control    -0.48 0.66 20 -0.728  0.8403 
#  B.2 - control    -0.70 0.66 20 -1.061  0.6548 
# 
# P value adjustment: dunnettx method for 4 tests

We may also be interested in some other comparisons, though. In particular, we might want to do some overall comparisons across the two factors. We will need custom contrasts for this.

Building custom contrasts

Custom contrasts are based on the estimated marginal means output from emmeans(). The first step to building custom contrasts is to calculate the estimated marginal means so we have them to work with. I will name this output emm1.

emm1 = emmeans(fit1, specs = ~ sub.rate)
emm1
#  sub.rate emmean    SE df lower.CL upper.CL
#  A.1        5.24 0.466 20     4.27     6.21
#  A.2        4.04 0.466 20     3.07     5.01
#  B.1        6.22 0.466 20     5.25     7.19
#  B.2        6.00 0.466 20     5.03     6.97
#  control    6.70 0.466 20     5.73     7.67
# 
# Confidence level used: 0.95

I’m going to start with a relatively simple example. I will compare mean resp of the A.2 group to the B.2 group via custom contrasts.

Building a custom contrast involves pulling out specific group means of interest from the emmeans() output. We pull out a group mean by making a vector to represent the specific mean of interest. In this vector we assign a 1 to the mean of the group of interest and a 0 to the other groups.

For example, to pull out the mean of A.2 from emm1 we will make a vector with 5 values in it, one for each row of the output. The second value will be a 1, since the mean of A.2 is on the second row of emm1. All the other values in the vector will be 0.

Below is the vector that represents the A.2 mean.

A2 = c(0, 1, 0, 0, 0)

Similarly, to pull out the mean of B.2 we’ll have a vector of 5 values with a 1 as the fourth value. The B.2 group is on the fourth row in emm1.

B2 = c(0, 0, 0, 1, 0)

When building custom contrasts via vectors like this, the vectors will always be the same length as the number of rows in the emmeans() output. I always calculate and print the estimated marginal means prior to building the vectors so I am certain of the number of rows and the order of the groups in the output.

The contrast() function for custom comparisons

Once we have the vectors that represent the means we are interested in comparing, we actually do the comparisons via the contrast() function. Since we are interested in a difference in mean response, we take the difference between the vectors that represent the means.

Taking the difference between vectors can be done inside or outside contrast(). In this example I’m doing it inside.

The contrast() function takes an emmGrid object (i.e., output from emmeans()) as the first argument. We give the comparison we want to do via a list passed to the method argument.

Here I want to calculate the difference in mean resp of A.2 and B.2. I subtract the B2 vector from the A2 vector. The output is the difference in mean resp between these groups.

contrast(emm1, method = list(A2 - B2) )
#  contrast          estimate   SE df t.ratio p.value
#  c(0, 1, 0, -1, 0)    -1.96 0.66 20 -2.972  0.0075

Using named lists for better output

Unfortunately you can’t tell what comparisons was done in the output above ????. We can use a named list in method to make the output more understandable.

contrast(emm1, method = list("A2 - B2" = A2 - B2) )
#  contrast estimate   SE df t.ratio p.value
#  A2 - B2     -1.96 0.66 20 -2.972  0.0075

Using “at” for simple comparisons

Note that I didn’t need to do a custom contrast to do this particular comparison. I could have gotten the comparison I wanted by using the at argument with pairwise in emmeans() and choosing just the two groups I was interested in.

emmeans(fit1, specs = pairwise ~ sub.rate, 
         at = list(sub.rate = c("A.2", "B.2") ) )
# $emmeans
#  sub.rate emmean    SE df lower.CL upper.CL
#  A.2        4.04 0.466 20     3.07     5.01
#  B.2        6.00 0.466 20     5.03     6.97
# 
# Confidence level used: 0.95 
# 
# $contrasts
#  contrast  estimate   SE df t.ratio p.value
#  A.2 - B.2    -1.96 0.66 20 -2.972  0.0075

Multiple custom contrasts at once

Multiple custom contrasts can be done simultaneously in contrast() by adding more comparisons to the method list. I’ll demonstrate this by doing the simple example comparison twice, changing only which group mean is subtracted from the other.

I name both elements of the list for ease of interpretation. I find naming the list of comparisons to be a key part of doing these custom contrasts.

contrast(emm1, method = list("A2 - B2" = A2 - B2,
                             "B2 - A2" = B2 - A2) )
#  contrast estimate   SE df t.ratio p.value
#  A2 - B2     -1.96 0.66 20 -2.972  0.0075 
#  B2 - A2      1.96 0.66 20  2.972  0.0075

Doing multiple comparisons at once means a multiple comparisons adjustment can be done as needed. In addition, we can use the confint() function do get confidence intervals for the comparisons.

I’ll add a multivariate-\(t\) adjustment via adjust = "mvt" and then get confidence intervals for the comparisons. Remember we can get both confidence intervals and tests for comparisons via summary() with infer = TRUE.

twocomp = contrast(emm1, method = list("A2 minus B2" = A2 - B2,
                             "B2 minus A2" = B2 - A2),
         adjust = "mvt") %>%
     confint()
twocomp
#  contrast    estimate   SE df lower.CL upper.CL
#  A2 minus B2    -1.96 0.66 20   -3.336   -0.584
#  B2 minus A2     1.96 0.66 20    0.584    3.336
# 
# Confidence level used: 0.95 
# Conf-level adjustment: mvt method for 2 estimates

More complicated custom contrasts

Now that we’ve seen a simple case, let’s do something slightly more complicated (and realistic). What if we want to compare the A group to the B group overall, regardless of the application rate?

This is a main effect comparison, so I need to average over the effect of the rate factor in order to estimate the overall effect of the levels of the substance factor.

To do this comparison I need the means for all four non-control factor levels. I’ll print emm1 again here so I remember the order of the output before starting to write out the vectors that represent the group means.

emm1
#  sub.rate emmean    SE df lower.CL upper.CL
#  A.1        5.24 0.466 20     4.27     6.21
#  A.2        4.04 0.466 20     3.07     5.01
#  B.1        6.22 0.466 20     5.25     7.19
#  B.2        6.00 0.466 20     5.03     6.97
#  control    6.70 0.466 20     5.73     7.67
# 
# Confidence level used: 0.95

I’ll need all means that involve A or B, which is the first four group means in emm1. I’ll make a vector to represent each of these group means.

While typing these vectors out isn’t too hard, since they only contain 5 values, when I have many groups and so really long vectors I sometimes use rep() to repeat all the 0 values.

A1 = c(1, 0, 0, 0, 0)
A2 = c(0, 1, 0, 0, 0)
B1 = c(0, 0, 1, 0, 0)
B2 = c(0, 0, 0, 1, 0)

The vectors I made represent means for combinations of substance and rate. I want to compare the overall substance group means, though. This can be done by averaging over the two rates. This involves literally taking the average of, e.g., A1 and A2 vectors to get a vector that represents the overall A mean.

Aoverall = (A1 + A2)/2
Boverall = (B1 + B2)/2

Now that we have vectors to represent the overall means we can do comparison of mean resp of the A group vs B group overall in contrast().

contrast(emm1, method = list("A - B" = Aoverall - Boverall) ) 
#  contrast estimate    SE df t.ratio p.value
#  A - B       -1.47 0.466 20 -3.152  0.0050

Custom contrasts are all built in this same basic way. You can also build your own contrast function if there is some contrast you do all the time that is not part of emmeans. See the custom contrasts section of the emmeans vignette for more info.

Just the code, please

Here’s the code without all the discussion.

library(emmeans) # v. 1.3.3
library(magrittr) # v. 1.5

dat = structure(list(sub.rate = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
5L, 5L), .Label = c("A.1", "A.2", "B.1", "B.2", "control"), class = "factor"), 
    resp = c(5.5, 4.9, 6.1, 3.6, 6.1, 3.5, 3, 4.1, 5, 4.6, 7.3, 
    5.6, 4.8, 7.2, 6.2, 4.3, 6.6, 6.5, 5.5, 7.1, 5.4, 6.7, 6.8, 
    8.5, 6.1)), row.names = c(NA, -25L), class = "data.frame")

str(dat)

fit1 = lm(resp ~ sub.rate, data = dat)

emmeans(fit1, specs = trt.vs.ctrlk ~ sub.rate)

emm1 = emmeans(fit1, specs = ~ sub.rate)
emm1

A2 = c(0, 1, 0, 0, 0)
B2 = c(0, 0, 0, 1, 0)
contrast(emm1, method = list(A2 - B2) )

contrast(emm1, method = list("A2 - B2" = A2 - B2) )

emmeans(fit1, specs = pairwise ~ sub.rate, 
         at = list(sub.rate = c("A.2", "B.2") ) )

contrast(emm1, method = list("A2 - B2" = A2 - B2,
                             "B2 - A2" = B2 - A2) )

twocomp = contrast(emm1, method = list("A2 minus B2" = A2 - B2,
                             "B2 minus A2" = B2 - A2),
         adjust = "mvt") %>%
     confint()
twocomp

emm1
A1 = c(1, 0, 0, 0, 0)
A2 = c(0, 1, 0, 0, 0)
B1 = c(0, 0, 1, 0, 0)
B2 = c(0, 0, 0, 1, 0)

Aoverall = (A1 + A2)/2
Boverall = (B1 + B2)/2

contrast(emm1, method = list("A - B" = Aoverall - Boverall) ) 

To leave a comment for the author, please follow the link and comment on their blog: Very statisticious on Very statisticious.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)