[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this article we will show how to run a three-way analysis of variance when both the third-order interaction effect and the second-order interaction effects are statistically significant. This type of analysis can become pretty tedious, especially when our factors have many levels, so we will try to explain it here as clearly as possible. (If you want to watch me doing these analyses live, get my free course on statistical analysis with R here.)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
First of all, let’s present the fictitious data we are going to work with. Let’s suppose that a pharmaceutical company is planning to launch a new vitamin that allegedly improves the employees’ resistance to effort. The vitamin is tested on a sample of 720 employees, divided into three groups: employees who take a placebo (the control group), employees who take the vitamin in low dose and employees who take the vitamin in high dose. Half of the employees are male, and half are female. Moreover, we have both blue collar employees and white collar employees in our sample. The resistance to effort is measured on a scale whatsoever, from 1 to 30 (30 being the highest resistance). Our goal is to determine whether the effort resistance is influenced by three factors: dose of vitamin (placebo, low dose, and high dose), gender (male, female) and type of employee (blue collar, white collar). You can find the experiment data in CSV format here. Third-order interaction effect First of all, let’s check whether the third-order interaction effect is significant. We are going to run the analysis using the aov function in the stats package (our data frame is called vitamin).
aov1 <- aov(effort~dose*gender*type, data=vitamin) summary(aov1)In the formula above the interaction effect is, of course, dosegendertype. The ANOVA results can be seen below (we have only kept the line presenting the third-order interaction effect).
Df Sum Sq Mean Sq F value Pr(>F) dose:gender:type 2 187 93.4 22.367 3.81e-10The interaction effect is statistically significant: F(2)=22.367, p<0.01. In other words, we do have a third-order interaction effect. In this situation, it is not advisable to report and interpret the second-order interaction effects (they could be misleading). Therefore, we are going to compute the simple second-order interaction effects. Simple second-order interaction effects The simple second-order interaction effects are the effects of each pair of factors at each level of the third factor. Specifically, we have to compute the following effects:
- the interaction effect of dose and type of employee, for each gender category (male and female)
- the interaction effect of gender and type of employee, at each dose level (placebo, low and high)
- the interaction effect of dose and gender, for each type of employee (blue collar and white collar).
vitamin_male <- filter(vitamin, gender=="male") vitamin_female <- filter(vitamin, gender=="female")Now we perform a two-way analysis of variance on each data frame (the factors being dose and type, of course). aov1 <- aov(effort~dose*type, data=vitamin_male) summary(aov1) aov2 <- aov(effort~dose*type, data=vitamin_female) summary(aov2) The results of the analyses are shown below (we have only retained the lines with the interaction effects).
Df Sum Sq Mean Sq F value Pr(>F) dose:type 2 249 124.7 28.42 3.57e-12 Df Sum Sq Mean Sq F value Pr(>F) dose:type 2 137.2 68.6 17.31 6.74e-08
We can notice that both simple second-order interaction effects are significant (p<0.01). So we are dealing with a combined influence of the factors dose and type of employee in both male and female groups. In this situation, we have to examine the simple main effects for each factor. This is what we are going to do in the next section. Simple main effects Let’s compute the main effect for the factor dose of vitamin, which is the most important (after all, the company wants to demonstrate that the vitamin does affect the resistance to effort). You will be able to compute the other simple main effects yourself, using this as an example. Now we must create four separate data frames, for each combination of the factors gender and type of employee: male – blue collar, male – white collar, female – blue collar, female – white collar. vitamin_male_blue <- filter(vitamin, gender=="male", type=="blue collar") vitamin_male_white <- filter(vitamin, gender=="male", type=="white collar") vitamin_female_blue <- filter(vitamin, gender=="female", type=="blue collar") vitamin_female_white <- filter(vitamin, gender=="female", type=="white collar") Next we perform a one-way ANOVA for each data frame. Let’s do it for the first group, male – blue collar. aov1 <- aov(effort~dose, data=vitamin_male_blue) summary(aov1)
Df Sum Sq Mean Sq F value Pr(>F) dose 2 2943.5 1471.8 349.9 <2e-16
The simple main effect for the factor dose on this group is statistically significant (p<0.01). In other words, there is a significant difference between placebo, low dose and high dose levels within the male – blue collar employees category, regarding the resistance to effort. To find out how big the differences are, we use the TuckeyHSD function to compute the test with the same name.
TukeyHSD(aov1) diff lwr upr p adj low dose-high dose -2.528333 -3.413363 -1.643303 0 placebo-high dose -9.558333 -10.443363 -8.673303 0 placebo-low dose -7.030000 -7.915030 -6.144970 0
By inspection of the table we conclude that the differences in effort resistance between the dose groups are significant (p<0.01). The highest difference, in absolute values, is that between low dose and placebo levels: 9.5 points. So the employees who took a high dose present a higher resistance to effort than those who just took a placebo. One more example: the simple main effects of the variable dose of vitamin on the female – blue collar group. aov1 <- aov(effort~dose, data=vitamin_female_blue) summary(aov1) Df Sum Sq Mean Sq F value Pr(>F)
dose 2 399.6 199.81 45.57 <2e-16
TukeyHSD(aov1) diff lwr upr p adj low dose-high dose 1.083333 0.1797508 1.986916 0.0141485 placebo-high dose -2.476667 -3.3802492 -1.573084 0.0000000 placebo-low dose -3.560000 -4.4635826 -2.656417 0.0000000The simple main effect is statistically significant, as it results from the first table. Furthermore, all the differences between dose levels are significant. The highest difference is the difference between low dose and placebo (3.5 points). To learn more on data analysis in R, check the free “Statistics with R” video course here.
To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.