Repeated Measures of ANOVA in R Complete Tutorial

finnstats

11 months ago

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Repeated Measures of ANOVA in R, in this tutorial we are going to discuss one-way and two-way repeated measures of ANOVA.

In this case, the same individuals are measured the same outcome variable under different time points or conditions.

This test is also known as a within-subjects ANOVA or ANOVA with repeated measures.

Data Science Jobs

Assumptions

1. No significant outlier

To identify the outlier, you can use the box plot method or the three sigma limit method.

2. Normality Assumption

The outcome of the dependent variable should be normally distributed in each cell of the design.

Based on the Shapiro Wilks test can use it for the same.

3. Assumption of Sphericity

These can be checked based on the anova_test function. The variance of the differences between groups should be equal.

Sphericity can be checked using Mauchly’s test of sphericity, which is automatically reported when using the R function anova_test()

The above assumptions are not met then you can use an alternative of Repeated Measures ANOVA in the nonparametric method (Friedman Test).

One-way Repeated Measures of ANOVA in R

Prerequisites

library(xlsx)
library(rstatix)
library(reshape)
library(tidyverse)
library(dplyr)
library(ggpubr)
library(plyr)
library(datarium)

Getting Data

data<-read.xlsx("D:/RStudio/data.xlsx",sheetName="Sheet1")
data <- data %>%
gather(key = "time", value = "score", T0, T1, T2) %>%
convert_as_factor(id, time)
data.frame(head(data, 3))
  id time       score
1 S1   T0 7.454333333
2 S2   T0 8.030000000
3 S3   T0 9.911666667

Objective

The objective is to identify any significant difference between time points exit or not.

Summary statistics

summary<-data %>%
group_by(time) %>%
get_summary_stats(score, type = "mean_sd")
data.frame(summary)
    time variable  n  mean sd
1   T0    score 24 7.853 3.082
2   T1    score 24 9.298 2.090
3   T2    score 24 5.586 0.396

Visualization

Create a box plot and add points corresponding to individual values:

You can use a box plot also for outlier identification.

Based on boxplot no outlier detected in the dataset.

Outlier Detection

outlier<-data %>%
group_by(time) %>%
identify_outliers(score)
data.frame(outlier)
[1] time       id         score      is.outlier is.extreme
<0 rows> (or 0-length row.names)

These indicates no outlier in the data set.

How to clean data sets?

Normality Checking

You can use Shapiro-Wilk’s test for normality checking, Please note if the data set is small then Shapiro-Wilk’s test is ideal otherwise go for the QQ plot.

normality<-data %>%
group_by(time) %>%
shapiro_test(score)
data.frame(normality)
    time variable    statistic      p
1   T0    score 0.9540460218 0.33073045010
2   T1    score 0.9868670126 0.98282491871
3   T2    score 0.9261641361 0.08004275234

The normality assumption can be checked by computing the Shapiro-Wilk test for each time point. If the data is normally distributed, the p-value should be greater than 0.05.

Tested data was normally distributed at each time point, as assessed by Shapiro-Wilk’s test (p > 0.05).

If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

Sphericity Assumption

res<-anova_test(data=data,dv=score,wid=id,within=time) 
get_anova_table(res) 
ANOVA Table (type III tests)
Effect  DFn   DFd      F          p p<.05   ges
1   time 1.42 32.73 24.436 0.00000297     * 0.343

The data set score was significantly different at the different time points during the diet, F(1.42, 33) = 24.4, p < 0.0001, eta2[g] = 0.34

How to do data reshaping in R?

Based on the anova_test function sphericity correction automatically applied to the output. The default is to apply automatically the Greenhouse-Geisser sphericity correction to only within-subject factors violating the sphericity assumption (i.e., Mauchly’s test p-value is significant, p <= 0.05).

Pair Wise comparison

pair<-data %>% 
pairwise_t_test( value~time,paired=TRUE, p.adjust.method = "bonferroni" ) 
data.frame(pair) 
  . .y. group1 group2 n1 n2     statistic df       p    p.adj     p.adj.signif
1 value     t1     t2 10 10  -4.967617803  9 0.000772000 0.002000           **
2 value     t1     t3 10 10 -13.228148424  9 0.000000334 0.000001         ****
3 value     t2     t3 10 10  -4.867815675  9 0.000886000 0.003000           **

Summary

Based on the repeated measures of ANOVA, all the tested pairwise differences between time points are statistically significant.

Two-way Repeated Measures of ANOVA in R

The two-way repeated measures ANOVA can be performed in order to determine whether there is a significant interaction between treatment and time on the score.

Getting data

data<-read.xlsx("D:/RStudio/data.xlsx",sheetName="Sheet1") data <- data %>%
gather(key = "time", value = "score", T0, T1, T2) %>%
convert_as_factor(id, Treatment,time)
head(data, 3)

id Treatment time score
S1    A       T0    7.5
S2    A       T0    8.0
S3    A       T0    9.9

Summary statistics

summary<-data %>%
group_by(Treatment,time) %>%
get_summary_stats(score, type = "mean_sd")
data.frame(summary)
Treatment time variable n mean sd
   A      T0     score 10 7.2 2.49
   A      T1     score 10 8.7 2.00
   A      T2     score 10 5.7 0.27
   B      T0     score 10 7.8 3.01
   B      T1     score 10 9.1 2.05
   B      T2     score 10 5.7 0.50

Outlier Detection

outlier<-data %>%
group_by(Treatment,time) %>%
identify_outliers(score)
data.frame(outlier)
Treatment time id score is.outlier is.extreme
A          T1   S7 6.0   TRUE        FALSE
A          T1   S9 4.7   TRUE        TRUE

Outlier got detected in the data frame for the subject S7 and S9. Need to remove or revisit the data points in the dataset for further analysis.

Data points are revisited and re analyzed.

tidyverse in R complete tutorial

Summary statistics need to reanalyze again

summary<-data %>%
group_by(Treatment,time) %>%
get_summary_stats(score, type = "mean_sd")
data.frame(summary)
Treatment time variable n mean sd
A          T0   score 10 7.2 2.49
A          T1   score 10 8.7 1.06
A          T2   score 10 5.7 0.27
B          T0   score 10 7.8 3.01
B          T1   score 10 9.1 2.05
B          T2   score 10 5.7 0.50

From the corrected data need check outlier detection again.

outlier<-data %>%
group_by(Treatment,time) %>%
identify_outliers(score)
data.frame(outlier)
[1] Treatment time id score is.outlier is.extreme 
<0 rows> (or 0-length row.names)

You can see now no outlier observed in the current dataset.

Normality Checking

normality<-data %>%
group_by(Treatment,time) %>%
shapiro_test(score)
data.frame(normality)
Treatment time variable statistic p
   A       T0   score   0.89  0.169
   A       T1   score   0.92  0.347
   A       T2   score   0.85  0.065
   B       T0   score   0.96  0.739
   B       T1   score   0.93  0.411
   B       T2   score   0.81  0.020

The data set score was normally distributed at each time point (p > 0.05), except for treatment B at time point T2, as assessed by Shapiro-Wilk’s test.

QQ Plot

Based on Shapiro-Wilk’s analysis at time point T2 non-normality observed, so further you need to check the QQ plot for the verification.

From the plot above, as all the points fall approximately along the reference line, we can assume normality.

Sphericity Assumption

res.aov <- anova_test(
data = data, dv = score, wid = id,
within = c(Treatment, time)
)
get_anova_table(res.aov)
ANOVA Table (type III tests)
Effect        DFn   DFd  F     p     p<.05  ges
Treatment      1.0   9  0.25 6.3e-01       0.009
time           2.0  18 26.30 4.5e-06     * 0.354
Treatment:time 1.2  11  0.12 7.9e-01       0.004

Observed statistically significant two-way interactions between treatment and time p < 0.005. If the interaction is not significant then simply execute pairwise t test comparison (Code mentioned in the end of the tutorial).

Effect of treatment at each time point

one.way <- data %>%
group_by(time) %>%
anova_test(dv = score, wid = id, within = Treatment) %>%
get_anova_table() %>%
adjust_pvalue(method = "bonferroni")
one.way
time Effect DFn DFd F p p..05 ges p.adj
T0   Treatment 1 9 0.169 0.69 0.01200 1
T1   Treatment 1 9 0.244 0.63 0.01600 1
T2   Treatment 1 9 0.003 0.96 0.00021 1

Pairwise comparisons between treatment groups at each time point

pwc <- data %>%
group_by(time) %>%
pairwise_t_test(
score ~ Treatment, paired = TRUE,
p.adjust.method = "bonferroni"
)
data.frame(pwc)
time .y. group1 group2 n1 n2 statistic df p  p.adj p.adj.signif
T0   score A     B    10  10 -0.411    9 0.69 0.69     ns
T1   score A     B    10  10 -0.494    9 0.63 0.63     ns
T2   score A     B    10  10 -0.051    9 0.96 0.96     ns

No significant difference was observed between samples for any of the tested time points.

Rank order analysis in R

Effect of time at each level of treatment

one.way2 <- data %>%
group_by(Treatment) %>%
anova_test(dv = score, wid = id, within = time) %>%
get_anova_table() %>%
adjust_pvalue(method = "bonferroni")
data.frame(one.way2)
Treatment Effect DFn  DFd F p p..05  ges p.adj
   A       time   1.3 11 11.0 0.005 * 0.40 0.01
   B       time   1.3 11 8.5  0.010 * 0.33 0.02

Pairwise comparisons between time points

pwc2 <- data %>%
group_by(Treatment) %>%
pairwise_t_test(
score ~ time, paired = TRUE,
p.adjust.method = "bonferroni"
)
data.frame(pwc2)
Treatment .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
   A     score  T0    T1    10  10 -2.2     9  5.2e-02 1.6e-01 ns
   A     score  T0    T2    10  10 1.9      9  9.0e-02 2.7e-01 ns
   A     score  T1    T2    10  10 8.2      9  1.8e-05 5.5e-05 ****
   B     score  T0    T1    10  10 -2.4     9  3.7e-02 1.1e-01 ns
   B     score  T0    T2    10  10 2.0      9  8.1e-02 2.4e-01 ns
   B     score  T1    T2    10  10 4.3      9  2.0e-03 6.0e-03 **

A significant difference was observed between time points T1 and T2 for treatments A & B (p<0.05).

If the interaction effect from ANOVA is not significant then you can simply execute a pairwise t-test based on the below command.

Comparisons for treatment variable

data %>%
pairwise_t_test(
score ~ Treatment, paired = TRUE,
p.adjust.method = "bonferroni"
)

Comparisons for the time variable

data %>%
pairwise_t_test(
score ~ time, paired = TRUE,
p.adjust.method = "bonferroni"
)

Conclusion

In this tutorial, we discussed repeated measures of ANOVA in R with codes. In the same way, you can execute Three-way repeated measures of ANOVA.

The post Repeated Measures of ANOVA in R Complete Tutorial appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Methods – finnstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.