Custom contrasts in emmeans
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Following up on a previous post, where I demonstrated the basic usage of package emmeans for doing post hoc comparisons, here I’ll demonstrate how to make custom comparisons (aka contrasts). These are comparisons that aren’t encompassed by the built-in functions in the package.
Remember that you can explore the available built-in emmeans functions for doing comparisons via ?"contrast-methods"
.
Table of Contents
- Reasons for custom comparisons
- R packages
- The dataset and model
- Treatment vs control comparisons
- Building custom contrasts
- The contrast() function for custom comparisons
- Using named lists for better output
- Using “at” for simple comparisons
- Multiple custom contrasts at once
- More complicated custom contrasts
- Just the code, please
Reasons for custom comparisons
There are a variety of reasons you might need custom comparisons instead of some of the standard, built-in ones. One common scenario that I see a lot is when we have a single control group for multiple factors, so the factors aren’t perfectly crossed. This comes up, e.g., when doing experiments that involve applying different substances (like fertilizers) at varying rates. One factor is the different substances applied and the other is different application rates. However, the control is applying nothing or water or something like that. There aren’t different rates of the control to apply, so there is a single control group for both factors.
Rather than trying to fit a model with multiple factors, focusing on main effects and the interaction, such data can be analyzed with a simple effects model. This is where the two (or more) factors of interest have been combined into a single factor for analysis. Such an analysis focuses on the effect of the two factors combined. We can use post hoc comparisons to estimate the overall effects of individual factors.
R packages
I will load magrittr for the pipe in addition to emmeans.
library(emmeans) # v. 1.3.3 library(magrittr) # v. 1.5
The dataset and model
I’ve made a small dataset to use as an example. The response variable is resp
and the two factors of interest have been combined into a single factor sub.rate
that has 5 levels: A.1
, A.2
, B.1
, B.2
, and control
.
One factor, which I’m thinking of as the substance factor, is represented by A
and B
(and the control). The second, the rate factor, is represented by 1
and 2
.
dat = structure(list(sub.rate = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), .Label = c("A.1", "A.2", "B.1", "B.2", "control"), class = "factor"), resp = c(5.5, 4.9, 6.1, 3.6, 6.1, 3.5, 3, 4.1, 5, 4.6, 7.3, 5.6, 4.8, 7.2, 6.2, 4.3, 6.6, 6.5, 5.5, 7.1, 5.4, 6.7, 6.8, 8.5, 6.1)), row.names = c(NA, -25L), class = "data.frame") str(dat) # 'data.frame': 25 obs. of 2 variables: # $ sub.rate: Factor w/ 5 levels "A.1","A.2","B.1",..: 1 1 1 1 1 2 2 2 2 2 ... # $ resp : num 5.5 4.9 6.1 3.6 6.1 3.5 3 4.1 5 4.6 ...
I will use a simple, linear model for analysis.
fit1 = lm(resp ~ sub.rate, data = dat)
Treatment vs control comparisons
The simple effects model makes it easy to get comparisons for each factor combination vs the control group with emmeans()
. I’ll use trt.vs.ctrlk
to do this since the control is the last level of the factor.
emmeans(fit1, specs = trt.vs.ctrlk ~ sub.rate) # $emmeans # sub.rate emmean SE df lower.CL upper.CL # A.1 5.24 0.466 20 4.27 6.21 # A.2 4.04 0.466 20 3.07 5.01 # B.1 6.22 0.466 20 5.25 7.19 # B.2 6.00 0.466 20 5.03 6.97 # control 6.70 0.466 20 5.73 7.67 # # Confidence level used: 0.95 # # $contrasts # contrast estimate SE df t.ratio p.value # A.1 - control -1.46 0.66 20 -2.214 0.1230 # A.2 - control -2.66 0.66 20 -4.033 0.0024 # B.1 - control -0.48 0.66 20 -0.728 0.8403 # B.2 - control -0.70 0.66 20 -1.061 0.6548 # # P value adjustment: dunnettx method for 4 tests
We may also be interested in some other comparisons, though. In particular, we might want to do some overall comparisons across the two factors. We will need custom contrasts for this.
Building custom contrasts
Custom contrasts are based on the estimated marginal means output from emmeans()
. The first step to building custom contrasts is to calculate the estimated marginal means so we have them to work with. I will name this output emm1
.
emm1 = emmeans(fit1, specs = ~ sub.rate) emm1 # sub.rate emmean SE df lower.CL upper.CL # A.1 5.24 0.466 20 4.27 6.21 # A.2 4.04 0.466 20 3.07 5.01 # B.1 6.22 0.466 20 5.25 7.19 # B.2 6.00 0.466 20 5.03 6.97 # control 6.70 0.466 20 5.73 7.67 # # Confidence level used: 0.95
I’m going to start with a relatively simple example. I will compare mean resp
of the A.2
group to the B.2
group via custom contrasts.
Building a custom contrast involves pulling out specific group means of interest from the emmeans()
output. We pull out a group mean by making a vector to represent the specific mean of interest. In this vector we assign a 1
to the mean of the group of interest and a 0
to the other groups.
For example, to pull out the mean of A.2
from emm1
we will make a vector with 5 values in it, one for each row of the output. The second value will be a 1
, since the mean of A.2
is on the second row of emm1
. All the other values in the vector will be 0
.
Below is the vector that represents the A.2
mean.
A2 = c(0, 1, 0, 0, 0)
Similarly, to pull out the mean of B.2
we’ll have a vector of 5 values with a 1
as the fourth value. The B.2
group is on the fourth row in emm1
.
B2 = c(0, 0, 0, 1, 0)
When building custom contrasts via vectors like this, the vectors will always be the same length as the number of rows in the emmeans()
output. I always calculate and print the estimated marginal means prior to building the vectors so I am certain of the number of rows and the order of the groups in the output.
The contrast() function for custom comparisons
Once we have the vectors that represent the means we are interested in comparing, we actually do the comparisons via the contrast()
function. Since we are interested in a difference in mean response, we take the difference between the vectors that represent the means.
Taking the difference between vectors can be done inside or outside contrast()
. In this example I’m doing it inside.
The contrast()
function takes an emmGrid
object (i.e., output from emmeans()
) as the first argument. We give the comparison we want to do via a list passed to the method
argument.
Here I want to calculate the difference in mean resp
of A.2
and B.2
. I subtract the B2
vector from the A2
vector. The output is the difference in mean resp
between these groups.
contrast(emm1, method = list(A2 - B2) ) # contrast estimate SE df t.ratio p.value # c(0, 1, 0, -1, 0) -1.96 0.66 20 -2.972 0.0075
Using named lists for better output
Unfortunately you can’t tell what comparisons was done in the output above ????. We can use a named list in method
to make the output more understandable.
contrast(emm1, method = list("A2 - B2" = A2 - B2) ) # contrast estimate SE df t.ratio p.value # A2 - B2 -1.96 0.66 20 -2.972 0.0075
Using “at” for simple comparisons
Note that I didn’t need to do a custom contrast to do this particular comparison. I could have gotten the comparison I wanted by using the at
argument with pairwise
in emmeans()
and choosing just the two groups I was interested in.
emmeans(fit1, specs = pairwise ~ sub.rate, at = list(sub.rate = c("A.2", "B.2") ) ) # $emmeans # sub.rate emmean SE df lower.CL upper.CL # A.2 4.04 0.466 20 3.07 5.01 # B.2 6.00 0.466 20 5.03 6.97 # # Confidence level used: 0.95 # # $contrasts # contrast estimate SE df t.ratio p.value # A.2 - B.2 -1.96 0.66 20 -2.972 0.0075
Multiple custom contrasts at once
Multiple custom contrasts can be done simultaneously in contrast()
by adding more comparisons to the method
list. I’ll demonstrate this by doing the simple example comparison twice, changing only which group mean is subtracted from the other.
I name both elements of the list for ease of interpretation. I find naming the list of comparisons to be a key part of doing these custom contrasts.
contrast(emm1, method = list("A2 - B2" = A2 - B2, "B2 - A2" = B2 - A2) ) # contrast estimate SE df t.ratio p.value # A2 - B2 -1.96 0.66 20 -2.972 0.0075 # B2 - A2 1.96 0.66 20 2.972 0.0075
Doing multiple comparisons at once means a multiple comparisons adjustment can be done as needed. In addition, we can use the confint()
function do get confidence intervals for the comparisons.
I’ll add a multivariate-\(t\) adjustment via adjust = "mvt"
and then get confidence intervals for the comparisons. Remember we can get both confidence intervals and tests for comparisons via summary()
with infer = TRUE
.
twocomp = contrast(emm1, method = list("A2 minus B2" = A2 - B2, "B2 minus A2" = B2 - A2), adjust = "mvt") %>% confint() twocomp # contrast estimate SE df lower.CL upper.CL # A2 minus B2 -1.96 0.66 20 -3.336 -0.584 # B2 minus A2 1.96 0.66 20 0.584 3.336 # # Confidence level used: 0.95 # Conf-level adjustment: mvt method for 2 estimates
More complicated custom contrasts
Now that we’ve seen a simple case, let’s do something slightly more complicated (and realistic). What if we want to compare the A
group to the B
group overall, regardless of the application rate?
This is a main effect comparison, so I need to average over the effect of the rate factor in order to estimate the overall effect of the levels of the substance factor.
To do this comparison I need the means for all four non-control factor levels. I’ll print emm1
again here so I remember the order of the output before starting to write out the vectors that represent the group means.
emm1 # sub.rate emmean SE df lower.CL upper.CL # A.1 5.24 0.466 20 4.27 6.21 # A.2 4.04 0.466 20 3.07 5.01 # B.1 6.22 0.466 20 5.25 7.19 # B.2 6.00 0.466 20 5.03 6.97 # control 6.70 0.466 20 5.73 7.67 # # Confidence level used: 0.95
I’ll need all means that involve A
or B
, which is the first four group means in emm1
. I’ll make a vector to represent each of these group means.
While typing these vectors out isn’t too hard, since they only contain 5 values, when I have many groups and so really long vectors I sometimes use rep()
to repeat all the 0 values.
A1 = c(1, 0, 0, 0, 0) A2 = c(0, 1, 0, 0, 0) B1 = c(0, 0, 1, 0, 0) B2 = c(0, 0, 0, 1, 0)
The vectors I made represent means for combinations of substance and rate. I want to compare the overall substance group means, though. This can be done by averaging over the two rates. This involves literally taking the average of, e.g., A1
and A2
vectors to get a vector that represents the overall A
mean.
Aoverall = (A1 + A2)/2 Boverall = (B1 + B2)/2
Now that we have vectors to represent the overall means we can do comparison of mean resp
of the A
group vs B
group overall in contrast()
.
contrast(emm1, method = list("A - B" = Aoverall - Boverall) ) # contrast estimate SE df t.ratio p.value # A - B -1.47 0.466 20 -3.152 0.0050
Custom contrasts are all built in this same basic way. You can also build your own contrast function if there is some contrast you do all the time that is not part of emmeans. See the custom contrasts section of the emmeans vignette for more info.
Just the code, please
Here’s the code without all the discussion.
library(emmeans) # v. 1.3.3 library(magrittr) # v. 1.5 dat = structure(list(sub.rate = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), .Label = c("A.1", "A.2", "B.1", "B.2", "control"), class = "factor"), resp = c(5.5, 4.9, 6.1, 3.6, 6.1, 3.5, 3, 4.1, 5, 4.6, 7.3, 5.6, 4.8, 7.2, 6.2, 4.3, 6.6, 6.5, 5.5, 7.1, 5.4, 6.7, 6.8, 8.5, 6.1)), row.names = c(NA, -25L), class = "data.frame") str(dat) fit1 = lm(resp ~ sub.rate, data = dat) emmeans(fit1, specs = trt.vs.ctrlk ~ sub.rate) emm1 = emmeans(fit1, specs = ~ sub.rate) emm1 A2 = c(0, 1, 0, 0, 0) B2 = c(0, 0, 0, 1, 0) contrast(emm1, method = list(A2 - B2) ) contrast(emm1, method = list("A2 - B2" = A2 - B2) ) emmeans(fit1, specs = pairwise ~ sub.rate, at = list(sub.rate = c("A.2", "B.2") ) ) contrast(emm1, method = list("A2 - B2" = A2 - B2, "B2 - A2" = B2 - A2) ) twocomp = contrast(emm1, method = list("A2 minus B2" = A2 - B2, "B2 minus A2" = B2 - A2), adjust = "mvt") %>% confint() twocomp emm1 A1 = c(1, 0, 0, 0, 0) A2 = c(0, 1, 0, 0, 0) B1 = c(0, 0, 1, 0, 0) B2 = c(0, 0, 0, 1, 0) Aoverall = (A1 + A2)/2 Boverall = (B1 + B2)/2 contrast(emm1, method = list("A - B" = Aoverall - Boverall) )
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.