Site icon R-bloggers

lmDiallel: a new R package to fit diallel models. The Gardner-Eberhart models

[This article was first published on R on The broken bridge between biologists and statisticians, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Another post for this series about diallel mating experiments. So far, we have published a paper in Plant Breeding (Onofri et al., 2020), where we presented lmDiallel, a new R package to fit diallel models. We followed up this paper with a series of four blog posts, giving more detail about the package (see here), about the Hayman’s models type 1 (see here) and type 2 (see here) and about the Griffing’s family of models (see here).

In this post we are going to talk about the Gardner-Eberarth models, which are particularly suitable to describe heterotic effects. The peculiar trait of these models is that they consider different means for crosses and selfed parents and, therefore, they are reserved for the mating designs 2 (selfed parents and crosses, but no reciprocals) or 1 (selfed parents, crosses and reciprocals). The first model is know as GE2 model and it is specified as:

\[y_{ijk} = \mu_{\nu} + \gamma_k + 0.5 \, \left( v_i + v_j \right) + \bar{h} + h_i + h_j + s_{ij} + \varepsilon_{ijk} \quad\quad\quad (1)\]

where \(\gamma_k\) is the effect of block \(k\) and \(\mu_{\nu}\) is the overall mean for all selfed parents (not the overall mean, as in other diallel models). The parameters \(v\) (\(v_i\) and \(v_j\)) represent the differences between the expected value for the selfed parents \(i\) and \(j\) and the mean for all selfed parents (\(\mu_{\nu}\)). According to the authors, this would be the Variety Effect (VE); as a consequence, the expected value for the \(i^{th}\) selfed parent is \(\mu_{\nu} + v_i\), while the expected value for the cross \(ij\), in absence of any dominance/heterosis effects, would be \(\mu_{\nu} + 0.5 \, \left( v_i + v_j \right)\), that is the mean value of its parents. There is a close relationship between \(g_i\) and \(g_j\) in Griffing’s equations (see here) and \(v_i\) and \(v_j\) in the GE2 equation (Eq. 1), that is: \(v_i = 2\, g_i + (n – 2) d_i\); therefore, the sum of squares for the GCA and VE effects are the same, although the estimates are different.

Since a cross does not necessarily respond according to the mean value of its parents, the parameter \(\bar{h}\) represents the average heterosis (H.BAR) contributed by the the whole set of genotypes used in crosses. In the balanced case, \(\bar{h}\) represents the difference between the overall mean for selfed parents and the overall mean for crosses, under the constraint that \(\bar{h} = 0\) for all selfed parents. Besides, the parameters \(h_i\) represent the average heterosis contributed by the \(i^{th}\) parent in its crosses (Hi), while \(s_{ij}\) is the Specific Combining Ability (SCA) for the cross between the \(i^{th}\) and \(j^{th}\) parents, that is totally equivalent to the corresponding parameter in Griffing’s models.

It is clear that both the Hayman’s model type 2 and the GE2 model account for the heterosis effect, although they do it in a different way: in Hayman’s model type 2 the specific effect of heterosis is assessed with reference to the overall mean, while in GE2 it is assessed by comparing the mean of a cross with the means of its parents. Indeed, the sum of squares for the ‘MDD’ effect in Hayman’s model and ‘Hi’ effect in GE2 model are perfectly the same, although the parameters are different.

Gardner and Eberhart proposed another model (GE3), which we have slightly modified to maintain a consistent notation in the frame of GLMs:

\[\left\{ {\begin{array}{ll} y_{ijk} = \mu_{\nu} + \gamma_k + \bar{h} + \textrm{gc}_i + \textrm{gc}_j + s_{ij} & \textrm{if} \quad i \neq j\\ y_{ijk} = \mu_{\nu} + \gamma_k + \textrm{sp}_i & \textrm{if} \quad i = j \end{array}} \right. \quad\quad\quad (2)\]

Equation 2 is an array of two separate elements for crosses and selfed parents. For the crosses (equation above), the parameters \(\textrm{gc}_i\) and \(\textrm{gc}_j\) represent the GCA for the \(i\) and \(j\) parents in all their crosses (GCAC); it should be noted that GCA \(\neq\) GCAC, as this latter effect is estimated without considering the selfed parents. The parameters \(s{ij}\) are the same as in the previous models (Hayman’s and Griffing’s models: SCA effect), while \(\textrm{sp}_i\) represent the effects of selfed parents (SP): they are numerically equivalent to the corresponding effects in Equation 1, but the sum of squares are different (see Murray et al., 2003). Therefore, we use different names for these two effects (SP and Hi).

Example 1: a half-diallel (no reciprocals)

As an example of a diallel experiments with no reciprocals, we consider the data reported in Lonnquist and Gardner (1961) relating to the yield of 21 maize genotypes, obtained from six male and six female parentals. The dataset is available as lonnquist61 in the lmDiallel package; in the box below we load the data, after installing (if necessary) and loading the ‘lmDiallel’ package.

# library(devtools) # Install if necessary
# install_github("OnofriAndreaPG/lmDiallel")
library(lmDiallel)
library(multcomp)
data("lonnquist61")

For this complete diallel experiment we can fit equation 1 (model GE2), by including the functions H.BAR(), VEi(), Hi() and SCA(); we can use either the lm() or the lm.diallel() functions, as shown in the box below.

dMod <- lm(Yield ~ H.BAR(Par1, Par2) + VEi(Par1, Par2) + 
             Hi(Par1, Par2) + SCA(Par1, Par2),
           data = lonnquist61)
dMod2 <- lm.diallel(Yield ~ Par1 + Par2,
                    data = lonnquist61, fct = "GE2")

In this case the dataset has no replicates and, therefore, we need to provide an estimate of the residual mean square and degrees of freedom. If we have fitted the model by using the lm() function, the resulting ‘lm’ object can be explored by using the summary.diallel() and anova.diallel() functions. Otherwise, if we have fitted the model with the lm.diallel() function, the resulting ‘diallel’ object can be explored by using the summary() and anova() methods. See the box below for an example: the results are, obviously, the same.

# summary.diallel(dMod, MSE = 7.1, dfr = 60)
anova.diallel(dMod, MSE = 7.1, dfr = 60)
## Analysis of Variance Table
## 
## Response: Yield
##                   Df  Sum Sq Mean Sq F value    Pr(>F)    
## H.BAR(Par1, Par2)  1 115.440 115.440 16.2592 0.0001583 ***
## VEi(Par1, Par2)    5 234.230  46.846  6.5980 5.923e-05 ***
## Hi(Par1, Par2)     5  59.720  11.944  1.6823 0.1527246    
## SCA(Par1, Par2)    9  63.781   7.087  0.9981 0.4515416    
## Residuals         60           7.100                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# summary(dMod2, MSE = 7.1, dfr = 60)
anova(dMod2, MSE = 7.1, dfr = 60)
## Analysis of Variance Table
## 
## Response: Yield
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## h.bar      1 115.440 115.440 16.2592 0.0001583 ***
## Variety    5 234.230  46.846  6.5980 5.923e-05 ***
## h.i        5  59.720  11.944  1.6823 0.1527246    
## SCA        9  63.781   7.087  0.9981 0.4515416    
## Residuals 60           7.100                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Also for the diallel object, we can retrieve the full list of genetical parameters with the glht() function, by using the same syntax as shown above.

gh <- glht(linfct = diallel.eff(dMod2, MSE = 7.1, dfr = 60))
summary(gh, test = adjusted(type = "none"))
#    Simultaneous Tests for General Linear Hypotheses
# 
# Linear Hypotheses:
#                Estimate Std. Error t value Pr(>|t|)    
# Intercept == 0   92.450      1.088  84.987  < 2e-16 ***
# h.bar == 0        5.190      1.287   4.032  0.00043 ***
# v_B == 0          4.150      2.432   1.706  0.09991 .  
# v_G == 0         -4.550      2.432  -1.871  0.07270 .  
# v_H == 0         -0.750      2.432  -0.308  0.76028    
# v_K == 0         -1.150      2.432  -0.473  0.64031    
# v_K2 == 0         3.750      2.432   1.542  0.13524    
# v_M == 0         -1.450      2.432  -0.596  0.55625    
#...
#...
# s_K2:K == 0       0.585      2.064   0.283  0.77909    
# s_K2:M == 0      -1.115      2.064  -0.540  0.59364    
# s_M:B == 0       -1.040      2.064  -0.504  0.61859    
# s_M:G == 0       -2.290      2.064  -1.110  0.27737    
# s_M:H == 0        3.385      2.064   1.640  0.11304    
# s_M:K == 0        1.060      2.064   0.514  0.61189    
# s_M:K2 == 0      -1.115      2.064  -0.540  0.59364 

If we want to fit Equation 2 (model GE3), we can follow a very similar approach, by using the functions H.BAR(), SP(), GCAC() and SCA(). The box below shows an example either with the lm() or the with the lm.diallel() functions.

dMod <- lm(Yield ~ H.BAR(Par1, Par2) + SP(Par1, Par2)
           + GCAC(Par1, Par2) + SCA(Par1, Par2),
           data = lonnquist61)
dMod2 <- lm.diallel(Yield ~ Par1 + Par2,
                    data = lonnquist61, fct = "GE3")

# summary.diallel(dMod, MSE = 7.1, dfr = 60)
anova.diallel(dMod, MSE = 7.1, dfr = 60)
## Analysis of Variance Table
## 
## Response: Yield
##                   Df  Sum Sq Mean Sq F value    Pr(>F)    
## H.BAR(Par1, Par2)  1 115.440 115.440 16.2592 0.0001583 ***
## SP(Par1, Par2)     5  55.975  11.195  1.5768 0.1804080    
## GCAC(Par1, Par2)   5 237.975  47.595  6.7035 5.069e-05 ***
## SCA(Par1, Par2)    9  63.781   7.087  0.9981 0.4515416    
## Residuals         60           7.100                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# summary(dMod2, MSE = 7.1, dfr = 60)
anova(dMod2, MSE = 7.1, dfr = 60)
## Analysis of Variance Table
## 
## Response: Yield
##             Df  Sum Sq Mean Sq F value    Pr(>F)    
## h.bar        1 115.440 115.440 16.2592 0.0001583 ***
## Selfed par.  5  55.975  11.195  1.5768 0.1804080    
## Varieties    5 237.975  47.595  6.7035 5.069e-05 ***
## SCA          9  63.781   7.087  0.9981 0.4515416    
## Residuals   60           7.100                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Also for the diallel object, we can retrieve the full list of genetical parameters with the glht() function, by using the same syntax as shown above.

gh <- glht(linfct = diallel.eff(dMod2, MSE = 7.1, dfr = 60))
summary(gh, test = adjusted(type = "none"))
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Linear Hypotheses:
##                Estimate Std. Error t value Pr(>|t|)    
## Intercept == 0   92.450      1.088  84.987  < 2e-16 ***
## h.bar == 0        5.190      1.287   4.032  0.00043 ***
## sp_B == 0         4.150      2.432   1.706  0.09991 .  
## sp_G == 0        -4.550      2.432  -1.871  0.07270 .  
## sp_H == 0        -0.750      2.432  -0.308  0.76028    
## sp_K == 0        -1.150      2.432  -0.473  0.64031    
## sp_K2 == 0        3.750      2.432   1.542  0.13524    
## sp_M == 0        -1.450      2.432  -0.596  0.55625    
## gc_B == 0         0.900      1.216   0.740  0.46593    
## gc_G == 0        -2.050      1.216  -1.686  0.10385    
## gc_H == 0        -0.025      1.216  -0.021  0.98376    
## gc_K == 0        -3.000      1.216  -2.467  0.02055 *  
## gc_K2 == 0        6.375      1.216   5.242 1.78e-05 ***
## gc_M == 0        -2.200      1.216  -1.809  0.08205 .  
## s_B:G == 0        4.810      2.064   2.330  0.02781 *  
## s_B:H == 0       -1.415      2.064  -0.686  0.49905    
## s_B:K == 0       -0.140      2.064  -0.068  0.94644    
## s_B:K2 == 0      -2.215      2.064  -1.073  0.29305    
## s_B:M == 0       -1.040      2.064  -0.504  0.61859    
## s_G:B == 0        4.810      2.064   2.330  0.02781 *  
## s_G:H == 0       -2.865      2.064  -1.388  0.17689    
## s_G:K == 0       -0.990      2.064  -0.480  0.63548    
## s_G:K2 == 0       1.335      2.064   0.647  0.52342    
## s_G:M == 0       -2.290      2.064  -1.110  0.27737    
## s_H:B == 0       -1.415      2.064  -0.686  0.49905    
## s_H:G == 0       -2.865      2.064  -1.388  0.17689    
## s_H:K == 0       -0.515      2.064  -0.250  0.80492    
## s_H:K2 == 0       1.410      2.064   0.683  0.50056    
## s_H:M == 0        3.385      2.064   1.640  0.11304    
## s_K:B == 0       -0.140      2.064  -0.068  0.94644    
## s_K:G == 0       -0.990      2.064  -0.480  0.63548    
## s_K:H == 0       -0.515      2.064  -0.250  0.80492    
## s_K:K2 == 0       0.585      2.064   0.283  0.77909    
## s_K:M == 0        1.060      2.064   0.514  0.61189    
## s_K2:B == 0      -2.215      2.064  -1.073  0.29305    
## s_K2:G == 0       1.335      2.064   0.647  0.52342    
## s_K2:H == 0       1.410      2.064   0.683  0.50056    
## s_K2:K == 0       0.585      2.064   0.283  0.77909    
## s_K2:M == 0      -1.115      2.064  -0.540  0.59364    
## s_M:B == 0       -1.040      2.064  -0.504  0.61859    
## s_M:G == 0       -2.290      2.064  -1.110  0.27737    
## s_M:H == 0        3.385      2.064   1.640  0.11304    
## s_M:K == 0        1.060      2.064   0.514  0.61189    
## s_M:K2 == 0      -1.115      2.064  -0.540  0.59364    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- none method)
#    Simultaneous Tests for General Linear Hypotheses
# 
# Linear Hypotheses:
#                Estimate Std. Error t value Pr(>|t|)    
# Intercept == 0   92.450      1.088  84.987  < 2e-16 ***
# h.bar == 0        5.190      1.287   4.032  0.00043 ***
# sp_B == 0         4.150      2.432   1.706  0.09991 .  
# sp_G == 0        -4.550      2.432  -1.871  0.07270 .  
# sp_H == 0        -0.750      2.432  -0.308  0.76028    
# sp_K == 0        -1.150      2.432  -0.473  0.64031    
# sp_K2 == 0        3.750      2.432   1.542  0.13524    
# ...
# ...
# s_K2:H == 0       1.410      2.064   0.683  0.50056    
# s_K2:K == 0       0.585      2.064   0.283  0.77909    
# s_K2:M == 0      -1.115      2.064  -0.540  0.59364    
# s_M:B == 0       -1.040      2.064  -0.504  0.61859    
# s_M:G == 0       -2.290      2.064  -1.110  0.27737    
# s_M:H == 0        3.385      2.064   1.640  0.11304    
# s_M:K == 0        1.060      2.064   0.514  0.61189    
# s_M:K2 == 0      -1.115      2.064  -0.540  0.59364   

Example 2: a full diallel experiment

If we have a full diallel experiment (with reciprocals), we can fit Equations 1 and 2, but we should also include the reciprocal effects, in order to avoid that the residual term is inflated and no longer provides a reliable estimate of the experimental error. We provide an example with the data in Hayman (1954), relating to a complete diallel experiment with eight parental lines, producing 64 combinations (8 selfs + 28 crosses with 2 reciprocals each). The R dataset is included in the ‘lmDiallel’ package and the models are fitted by using the same coding as above, apart from the fact that the function REC() is included in the lm() call and the arguments “GE2r” and “GE3r” are used instead of “GE2” and “GE3” in the lm.diallel() call.

data("hayman54")
contrasts(hayman54$Block) <- "contr.sum"
dMod <- lm(Ftime ~ Block + H.BAR(Par1, Par2) + VEi(Par1, Par2) + 
             Hi(Par1, Par2) + SCA(Par1, Par2) + REC(Par1, Par2),
           data = hayman54)
dMod2 <- lm.diallel(Ftime ~ Par1 + Par2, Block = Block,
                    data = hayman54, fct = "GE2r")
# summary(dMod2)
anova(dMod2)
## Analysis of Variance Table
## 
## Response: Ftime
##             Df Sum Sq Mean Sq F value    Pr(>F)    
## Block        1    142     142  0.3416   0.56100    
## h.bar        1  30797   30797 73.8840 3.259e-12 ***
## Variety      7 277717   39674 95.1805 < 2.2e-16 ***
## h.i          7  34153    4879 11.7050 1.957e-09 ***
## SCA         20  37289    1864  4.4729 2.560e-06 ***
## Reciprocals 28  19112     683  1.6375   0.05369 .  
## Residuals   63  26260                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
gh <- glht(linfct = diallel.eff(dMod2))
summary(gh, test = adjusted(type = "none"))
#    Simultaneous Tests for General Linear Hypotheses
# 
# Linear Hypotheses:
#                  Estimate Std. Error t value Pr(>|t|)    
# Intercept == 0  2.039e+02  5.104e+00  39.956  < 2e-16 ***
# h.bar == 0     -4.690e+01  5.456e+00  -8.596 4.48e-09 ***
# v_A == 0        8.506e+01  1.350e+01   6.299 1.14e-06 ***
# v_B == 0       -3.344e+01  1.350e+01  -2.476 0.020115 *  
# v_C == 0        1.841e+02  1.350e+01  13.630 2.37e-13 ***
# v_D == 0        3.706e+01  1.350e+01   2.745 0.010839 *  
# v_E == 0       -3.794e+01  1.350e+01  -2.809 0.009301 ** 
# v_F == 0       -3.394e+01  1.350e+01  -2.513 0.018499 *  
# v_G == 0       -1.509e+02  1.350e+01 -11.177 1.99e-11 ***
# v_H == 0       -4.994e+01  1.350e+01  -3.698 0.001023 ** 
# h_A == 0        4.885e+00  7.797e+00   0.627 0.536380    
# ...
# ...
# r_H:C == 0     -5.500e+00  1.021e+01  -0.539 0.594620    
# r_H:D == 0     -5.000e+00  1.021e+01  -0.490 0.628380    
# r_H:E == 0     -8.500e+00  1.021e+01  -0.833 0.412617    
# r_H:F == 0     -1.750e+01  1.021e+01  -1.714 0.098370 .  
# r_H:G == 0     -1.400e+01  1.021e+01  -1.371 0.181956    

The code for the GE3 model with reciprocal effects is shown in the box below.

dMod <- lm(Ftime ~ Block + H.BAR(Par1, Par2) + SP(Par1, Par2)
           + GCAC(Par1, Par2) + SCA(Par1, Par2) + REC(Par1, Par2),
           data = hayman54)
dMod2 <- lm.diallel(Ftime ~ Par1 + Par2, Block = Block,
                    data = hayman54, fct = "GE3r")
# summary(dMod2)
anova(dMod2)
## Analysis of Variance Table
## 
## Response: Ftime
##             Df Sum Sq Mean Sq F value    Pr(>F)    
## Block        1    142   142.4  0.3416   0.56100    
## h.bar        1  30797 30796.9 73.8840 3.259e-12 ***
## gcac         7 168923 24131.9 57.8941 < 2.2e-16 ***
## Selfed par.  7 142946 20420.9 48.9913 < 2.2e-16 ***
## SCA         20  37289  1864.4  4.4729 2.560e-06 ***
## Reciprocals 28  19112   682.6  1.6375   0.05369 .  
## Residuals   63  26260                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# anova(dMod)
gh <- glht(linfct = diallel.eff(dMod2))
# summary(gh, test = adjusted(type = "none"))

Estimation of variance components (random genetic effects)

If we intend to regard the genetic effects as random and to estimate variance components, we can use the mmer() function in the ‘sommer’ package (Covarrubias-Pazaran, 2016), although we need to code a bunch of dummy variables. In order to make things simpler for routine experiments, we have coded the mmer.diallel() wrapper using the same syntax as the lm.diallel() function. The exemplary code is given in the box below.

# Random genetic effects
mod1m <- mmer.diallel(Yield ~ Par1 + Par2,
                      data = lonnquist61,
                      fct = "GE2")
mod1m
##          VarComp VarCompSE
## Variety 2.344044  2.333869
## h.i     5.172099  4.905691
## SCA     6.142047  2.789668
mod2m <- mmer.diallel(Yield ~ Par1 + Par2,
                      data = lonnquist61,
                      fct = "GE3")
mod2m
##               VarComp VarCompSE
## GCAC        10.125567  7.563026
## Selfed par.  4.107823  7.830039
## SCA          7.087220  3.342822
mod3m <- mmer.diallel(Ftime ~ Par1 + Par2,
                      data = hayman54,
                      fct = "GE2r")
mod3m
##                VarComp  VarCompSE
## Variety     2347.35935 1279.94018
## h.i          634.70067  408.56286
## SCA          362.24772  148.53288
## Reciprocals   66.78085   49.16288
## Residuals    415.44775   73.43871
mod4m <- mmer.diallel(Ftime ~ Par1 + Par2,
                      data = hayman54,
                      fct = "GE3r")
mod4m
##                 VarComp  VarCompSE
## GCAC          927.78740  537.89968
## Selfed par. 10003.93261 5456.47108
## SCA           362.96912  148.50097
## Reciprocals    67.50895   49.11942
## Residuals     412.54141   72.93144

Thanks for reading!

Andrea Onofri
Luigi Russi
Niccolò Terzaroli
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)
andrea.onofri@unipg.it


References

  • Covarrubias-Pazaran, G., 2016. Genome-Assisted Prediction of Quantitative Traits Using the R Package sommer. PLOS ONE 11, e0156744. https://doi.org/10.1371/journal.pone.0156744
  • Gardner, C.O., Eberhart, S.A., 1966. Analysis and Interpretation of the Variety Cross Diallel and Related Populations. Biometrics 22, 439. https://doi.org/10.2307/2528181
  • Griffing, B., 1956. Concept of general and specific combining ability in relation to diallel crossing systems. Australian Journal of Biological Science 9, 463–493.
  • Hayman, B.I., 1954. The Analysis of Variance of Diallel Tables. Biometrics 10, 235. https://doi.org/10.2307/3001877
  • Möhring, J., Melchinger, A.E., Piepho, H.P., 2011b. REML-Based Diallel Analysis. Crop Science 51, 470–478. https://doi.org/10.2135/cropsci2010.05.0272

To leave a comment for the author, please follow the link and comment on their blog: R on The broken bridge between biologists and statisticians.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.