Site icon R-bloggers

MANOVA Test Statistics with R

[This article was first published on R – Aaron Schlegel, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Multiple tests of significance can be employed when performing MANOVA. The most well known and widely used MANOVA test statistics are Wilk’s \(\Lambda\), Pillai, Lawley-Hotelling, and Roy’s test. Unlike ANOVA in which only one dependent variable is examined, several tests are often utilized in MANOVA due to its multidimensional nature. Each MANOVA test statistic is different and can lead to different conclusions depending on how the data and mean vectors lie, thus understanding how the test statistics are performed and how they differ from one another can be valuable when determining the results of MANOVA.

This post will introduce and explore the three MANOVA test statistics using the rootstock data from the previous MANOVA post. The rootstock data were obtained from the companion FTP site of the book Methods of Multivariate Analysis by Alvin Rencher.

The data is loaded, and the columns are named as before.

root <- read.table('ROOT.DAT', col.names = c('Tree.Number', 'Trunk.Girth.4.Years', 'Ext.Growth.4.Years', 'Trunk.Girth.15.Years', 'Weight.Above.Ground.15.Years'))
root$Tree.Number <- as.factor(root$Tree.Number)

Perform MANOVA on the rootstock data.

root.manova <- manova(cbind(root$Trunk.Girth.4.Years, root$Ext.Growth.4.Years, root$Trunk.Girth.15.Years, root$Weight.Above.Ground.15.Years) ~ Tree.Number, 
                      data = root)
root.summary <- summary(root.manova)
root.summary
##             Df Pillai approx F num Df den Df    Pr(>F)    
## Tree.Number  5 1.3055   4.0697     20    168 1.983e-07 ***
## Residuals   42                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As before, \(H_0\) is rejected, and it is concluded there are significant differences in the group mean vectors. The default test statistic used in the summary() call is the Pillai statistic. The same call can be made with summary(root.manova, test='Pillai'). The manova() function provides four multivariate tests by setting the test argument to either Pillai, Wilks, Hotelling-Lawley, or Roy.

The next section examines calculating the Pillai and other MANOVA test statistics and any departures from the above conclusion that may result.

Calculating MANOVA Test Statistics

The test statistics rely on the error \(E\) and hypothesis \(H\) matrices as explored in the previous post, which can be calculated with the following code.

n <- dim(root)[1] / length(unique(root$Tree.Number))
total.means <- colMeans(root[,2:5])

root.group <- split(root[,2:5], root$Tree.Number)
root.means <- sapply(root.group, function(x) {
  apply(x, 2, mean)
}, simplify = 'data.frame')

H = matrix(data = 0, nrow = 4, ncol = 4)
for (i in 1:dim(H)[1]) {
  for (j in 1:i) {
    H[i,j] <- n * sum((root.means[i,] - total.means[i]) * (root.means[j,] - total.means[j]))
    H[j,i] <- n * sum((root.means[j,] - total.means[j]) * (root.means[i,] - total.means[i]))
  }
}

E = matrix(data = 0, nrow = 4, ncol = 4)
for (i in 1:dim(E)[1]) {
  for (j in 1:i) {
    b <- c() 
    for (k in root.group) {
      a <- sum((k[,i] - mean(k[,i])) * (k[,j] - mean(k[,j])))
      b <- append(b, a)
    }
    E[i,j] <- sum(b)
    E[j,i] <- sum(b)
  }
}


H # Hypothesis Matrix
##            [,1]      [,2]      [,3]     [,4]
## [1,] 0.07356042 0.5373852 0.3322646 0.208470
## [2,] 0.53738521 4.1996619 2.3553885 1.637108
## [3,] 0.33226458 2.3553885 6.1139354 3.781044
## [4,] 0.20847000 1.6371084 3.7810438 2.493091


E # Error matrix
##           [,1]      [,2]      [,3]     [,4]
## [1,] 0.3199875  1.696564 0.5540875 0.217140
## [2,] 1.6965637 12.142790 4.3636125 2.110214
## [3,] 0.5540875  4.363613 4.2908125 2.481656
## [4,] 0.2171400  2.110214 2.4816563 1.722525

The same matrices can also be extracted from the previous summary call.

root.summary$SS
## $Tree.Number
##            [,1]      [,2]      [,3]     [,4]
## [1,] 0.07356042 0.5373852 0.3322646 0.208470
## [2,] 0.53738521 4.1996619 2.3553885 1.637108
## [3,] 0.33226458 2.3553885 6.1139354 3.781044
## [4,] 0.20847000 1.6371084 3.7810437 2.493091
## 
## $Residuals
##           [,1]      [,2]      [,3]     [,4]
## [1,] 0.3199875  1.696564 0.5540875 0.217140
## [2,] 1.6965637 12.142790 4.3636125 2.110214
## [3,] 0.5540875  4.363612 4.2908125 2.481656
## [4,] 0.2171400  2.110214 2.4816562 1.722525
Pillai Test Statistic

The Pillai test statistic is denoted as \(V^{(s)}\) and defined as:

\( V^{(s)} = tr[(E + H0)^{-1} H] = \sum^s_{i=1} \frac{\lambda_i}{1 + \lambda_i} \)

Where \(\lambda_i\) represents the \(i\)th nonzero eigenvalue of \(E^{-1}H\). The Pillai statistic can be computed with either of the below.

e1h.eigen <- eigen(solve(E) %*% H)

sum(diag(solve(E+H) %*% H))
## [1] 1.305472


sum(e1h.eigen$values / (1 + e1h.eigen$values))
## [1] 1.305472


summary(root.manova, 'Pillai')$stats[,2][1]
## Tree.Number 
##    1.305472
Wilk’s \(\Lambda\)

Wilk’s \(\Lambda\), one of the more commonly used MANOVA test statistics, compares the \(E\) matrix to the total \(E + H\) matrix. Wilk’s \(\Lambda\) is calculated using the determinants of those two matrices.

\( \Lambda = \frac{\left| E \right|}{\left| E + H \right|} \)

Wilk’s \(\Lambda\) can also be written in terms of the \(E^{-1}H\) eigenvalues \((\lambda_1, \lambda_2, \cdots, \lambda_s)\).

\( \Lambda = \prod^s_{i=1} \frac{1}{1 + \lambda_i} \)
det(E) / det(E + H)
## [1] 0.1540077


prod(1 / (1 + e1h.eigen$values))
## [1] 0.1540077


summary(root.manova, 'Wilks')$stats[,2][1]
## Tree.Number 
##   0.1540077

An approximate F-statistic can be calculated.

\( F = \frac{1 – \Lambda^{(1/t)}}{\Lambda^{(1/t)}} \frac{df_2}{df_1} \)

Where \(df_1\) and \(df_2\) degrees of freedom and \(t\) is equal to:

\( df_1 = pv_H, \qquad df_2 = wt – \frac{1}{2}(pv_H – 2), \qquad w = v_E + v_H – \frac{1}{2}(p + v_H + 1) \)
\( t = \sqrt{\frac{p^2v^2_H – 4}{p^2 + v^2_H – 5}} \)

k <- length(unique(root$Tree.Number))
p <- length(root[,2:5])
vh <- k - 1
ve <- dim(root)[1] - k

t <- sqrt((p^2 * vh^2 - 4) / (p^2 + vh^2 -5))

df1 <- p * vh
df2 <- (ve + vh - .5 * (p + vh + 1)) * t - .5 * (p * vh - 2)

f <- (1 - (det(E) / det(E + H))^(1/t)) / (det(E) / det(E + H))^(1/t) * df2 / df1

f
## [1] 4.936888


summary(root.manova, 'Wilks')$stats[,3][1]
## Tree.Number 
##    4.936888
Lawley-Hotelling Statistic

The Lawley-Hotelling statistic, also known as Hotelling’s generalized \(T^2\)-statistic, is denoted \(U^{(s)}\) and is defined as:

\( U^{(s)} = tr(E^{-1}H) = \sum^s_{i=1} \lambda_i \)
sum(diag(solve(E) %*% H))
## [1] 2.921368


sum(e1h.eigen$values)
## [1] 2.921368


summary(root.manova, 'Hotelling-Lawley')$stats[,2][1]
## Tree.Number 
##    2.921368

The approximate F-statistic in the Lawley-Hotelling setting is defined as:

\( F = \frac{2 (sN + 1)U^{(s)}}{s^2(2m + s + 1)} \)

Where \(s\), \(m\), and \(N\) are defined as:

\( s = min(p, V_h) \qquad m = \frac{1}{2} (\left| V_h – p \right| – 1) \qquad N = \frac{1}{2} (V_E – p – 1) \)
s <- min(vh, p)
m <- .5 * (abs(vh - p) - 1)
N <- .5 * (ve - p - 1)

lawley.hotelling.f <- (2 * (s * N + 1) * sum(diag(solve(E) %*% H))) / (s^2 * (2 * m + s + 1))
lawley.hotelling.f
## [1] 5.477566


summary(root.manova, 'Hotelling-Lawley')$stats[,3][1]
## Tree.Number 
##    5.477566
Roy’s Test

Roy’s test statistic is the largest eigenvalue of the matrix \(E^{-1}H\).

roy.stat <- e1h.eigen$values[1]
roy.stat
## [1] 1.875671


summary(root.manova, 'Roy')$stats[,2][1]
## Tree.Number 
##    1.875671

Roy’s test differs from the previous three in that the F-statistic is found by maximizing the spread of the transformed means. The maximum occurs at the eigenvector corresponding to the largest (first) eigenvalue of \(E^{-1}H\). Thus the F-statistic in Roy’s setting can be computed as:

\( \frac{k(n – 1)}{k – 1} \lambda_1 \)
roy.f <- k * (n - 1) / (k - 1) * e1h.eigen$values[1]
roy.f
## [1] 15.75564


summary(root.manova, 'Roy')$stats[,3][1]
## Tree.Number 
##    15.75564


?summary.manova
Comparison of the MANOVA Test Statistics

According to the documentation in ?summary.manova, “Wilks’ statistic is most popular in the literature, but the default Pillai-Bartlett statistic is recommended by Hand and Taylor (1987).” Although in this example each test rejected \(H_0\), there are instances in which one test may accept \(H_0\) while the others determine rejection. Rencher recommends not using Roy’s test in any situation unless there is collinearity amongst the dependent variables (Rencher, n.d., pp. 177–177).

References

Andrews, D. F., and Herzberg, A. M. (1985), Data, New York: Springer-Verlag.

Rencher, A. (n.d.). Methods of Multivariate Analysis (2nd ed.). Brigham Young University: John Wiley & Sons, Inc.

The post MANOVA Test Statistics with R appeared first on Aaron Schlegel.

To leave a comment for the author, please follow the link and comment on their blog: R – Aaron Schlegel.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.