Analysing seed germination and emergence data with R (a tutorial). Part 9

[This article was first published on R on The broken bridge between biologists and statisticians, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is a follow-up post. If you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts expand on a manuscript that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper. In order to work throughout this post, you need to install the ‘drcte’ and ‘drcSeedGerm’ packages, by using the code provided in this page.

Quantiles from time-to-event models

We have previously shown that time-to-event models (e.g., germination or emergence models) can be used to describe the time course of germinations/emergences for a seed lot (this post) or for several seed lots, submitted to different experimental treatments (this post). We have seen that fitted models can be used to extract information of biological relevance (this post).

A time-to-event model is, indeed, a cumulative probability function for germination time and, therefore, we might be interested in finding the quantiles. But, what are the ‘quantiles’? It is a set of ‘cut-points’ that divide the distribution of event-times into a set of \(q\) intervals with equal probability. For example the 100-quantiles, also named as the percentiles, divide the distribution of event-times into \(q = 100\) groups. Some of these cut-off points may be particularly relevant: for example the 50-th percentile corresponds to the time required to reach 50% germination (T50) and it is regarded as a good measure of germination velocity. Other common percentiles are the T10, or the T30, which are used to express the germination velocity for the quickest seeds in the lot.

Extracting some relevant percentiles from the time-to-event curve is regarded as an important task, to sintetically describe the germination/emergence velocity of seed populations. To this aim, we have included the quantile() method in the drcte package, that addresses most of the peculiarities of seed germination/emergence data. In this post, we will show the usage of this function.

Peculiarities of seed science data

I know that you are looking forward to getting the quantiles for your time-to-event curve. Please, hang on for a while… we need to become aware of a couple of issues, that are specific to germination/emergence data and are not covered in literature for other types of time-to-event data (e.g., survival data).

Quantiles and ‘restricted’ quantiles

Due to the presence of the ungerminated/unemerged fraction, quantiles suffer from the intrinsic ambiguity that we could calculate them either for the whole sample, or for the germinated fraction. For example, let’s think that we have a seed lot where the maximum percentage of germination is 60% and thus 40% of seeds are dormant. How do we define the 50th percentile?

In general, we should consider the whole population, including the ungerminated fraction, where the event is not observed; accordingly, the, e.g., 50th percentile (T50) should be defined as the time to 50% germination. Obviously, with such a definition the, e.g., T50 cannot be estimated when the maximum germinated fraction is lower than 50%.

On the other hand, for certain applications, it might be ok to remove the ungerminated fraction prior to estimating the quantiles; in this case, for our example where the maximum germinated fracion is 60%, the T50 would be defined as the time to 30% germination, that is 50% of the maximum germinated fraction.

Due to such an ambiguity, we should talk about quantiles and ‘restricted’ quantiles. The graph below should help clarify such a difference.

As a general suggestion, we should never use restricted quantiles for seed germination/emergence studies, especially when the purpose is to make comparisons across treatment groups (Bradford, 2002; Keshtkar et al. 2021).

Germination/emergence rates

The quantiles of germination times (e.g., T10, T30 or T50) are very common measures of germination velocity, although they may be rather counterintuitive, because a high germination time implies low velocity. Another common measure of velocity is the Germination Rate, that is the inverse of germination time (e.g., GR10 = 1/T10).

The quantiles of germination rates (e.g., GR10, GR30, GR50…) represents the daily progress to germination for a given subpopulation and they are used as the basis for hydro-thermal-time modelling. Therefore, their determination for a seed lot is also very relevant.

Getting the quantiles with ‘drcte’

Parametric models

In a previous post we have used the code below to fit a log-logistic time-to-event model to the germination data for three species of the Verbascum genus:

library(drcte)
library(drcSeedGerm)
data(verbascum)
mod1 <- drmte(nSeeds ~ timeBef + timeAf, fct = LL.3(),
             curveid = Species, data = verbascum)
summary(mod1, units = Dish)
## 
## Model fitted: Log-logistic (ED50 as parameter) with lower limit at 0
## 
## Robust estimation: Cluster robust sandwich SEs 
## 
## Parameter estimates:
## 
##               Estimate Std. Error  t value  Pr(>|t|)    
## b:arcturus   -9.930005   1.111135  -8.9368 4.286e-16 ***
## b:blattaria  -7.198025   0.578038 -12.4525 < 2.2e-16 ***
## b:creticum  -11.033749   0.943374 -11.6961 < 2.2e-16 ***
## d:arcturus    0.356648   0.047915   7.4433 3.715e-12 ***
## d:blattaria   0.840064   0.025593  32.8242 < 2.2e-16 ***
## d:creticum    0.969990   0.017290  56.1015 < 2.2e-16 ***
## e:arcturus   12.059189   0.486010  24.8126 < 2.2e-16 ***
## e:blattaria   4.031763   0.199741  20.1850 < 2.2e-16 ***
## e:creticum    3.200655   0.103161  31.0259 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It may be useful to rank the species in terms of their germination velocity and, to that purpose, we could estimate the times to 30% and 50% germination (T30 and T50), that are the 30th and 50th percentiles of the time-to-event distribution. We can use the quantile() method, where the probability levels are passed in as the vector ‘probs’:

quantile(mod1, probs = c(0.3, 0.5))
## 
## Estimated quantiles
## 
##               Estimate Std. Error
## arcturus:30%   14.2634   0.992685
## arcturus:50%       NaN        NaN
## blattaria:30%   3.7156   0.106630
## blattaria:50%   4.2536   0.118066
## creticum:30%    2.9759   0.058279
## creticum:50%    3.2187   0.059480

We may note that the T50 is not estimable with Verbascum arcturus, as the maximum germinated proportion (d parameter for the time-to-event model above) is 0.36. Standard errors are obtained by using the delta method and they are invalid whenever the experimental units (seeds) are clustered within containers, such as the Petri dishes.

For all these cases, we should prefer cluster-robust standard errors (Zeileis et al. 2020), which can be obtained by setting the extra argument ‘robust = TRUE’ and providing a clustering variable as the units argument. By setting ‘interval = TRUE’ we can also obtain confidence intervals for the desired probability level (0.95, by default).

quantile(mod1, probs = c(0.3, 0.5), robust = T,
         units = Dish,
         interval = T)
## 
## Estimated quantiles
## 
##               Estimate Std. Error   Lower   Upper
## arcturus:30%   14.2634   0.755294 12.7830 15.7437
## arcturus:50%       NaN        NaN     NaN     NaN
## blattaria:30%   3.7156   0.188941  3.3452  4.0859
## blattaria:50%   4.2536   0.209027  3.8439  4.6632
## creticum:30%    2.9759   0.085151  2.8090  3.1428
## creticum:50%    3.2187   0.104743  3.0134  3.4240

We may note that cluster robust standard errors are higher than naive standard errors: the seed in the same Petri dish are correlated and, thus, they do not contribute full information.

If we are interested in the germination rates G30 and G50, we can set the argument ‘rate = T’, as shown in the box below.

quantile(mod1, probs = c(0.3, 0.5), robust = T,
         units = Dish,
         interval = T, rate = T)
## 
## Estimated quantiles
## 
##               Estimate        SE    Lower    Upper
## arcturus:30%   0.07011 0.0037126 0.062833 0.077386
## arcturus:50%   0.00000 0.0000000 0.000000 0.000000
## blattaria:30%  0.26914 0.0136861 0.242315 0.295963
## blattaria:50%  0.23510 0.0115531 0.212454 0.257741
## creticum:30%   0.33604 0.0096153 0.317191 0.354882
## creticum:50%   0.31069 0.0101106 0.290872 0.330505

Parametric models with environmental covariates

If we have fitted a hydro-thermal time model or other models with an environmental covariate, we can also use the quantile() method, and pass a value for that covariate, as shown in the code below.

# Parametric model with environmental covariate
data(rape)
modTE <- drmte(nSeeds ~ timeBef + timeAf + Psi,
            data = rape, fct = HTLL())
quantile(modTE, Psi = 0, 
         probs = c(0.05, 0.10, 0.15, 0.21), 
         restricted = F, rate = T, robust = T,
         interval = T)
## 
## Estimated quantiles
## 
##       Estimate      SE  Lower  Upper
## 1:5%    1.6630 0.29134 1.0919 2.2340
## 1:10%   1.6581 0.28523 1.0990 2.2171
## 1:15%   1.6546 0.28115 1.1036 2.2056
## 1:21%   1.6513 0.27745 1.1075 2.1951

The environmental covariate only accepts a single value; in order to vectorise, we need to use the lapply() function, as shown below.

# This is to vectorise on Psi
psiList <- seq(-1, 0, 0.25)
names(psiList) <- as.character(psiList)
lapply(psiList, 
       function(x) quantile(modTE, Psi = x, 
                            probs = c(0.05, 0.10, 0.15, 0.21),
                            restricted = F, rate = T,
                            interval = "delta", 
                            units = rape$Dish, 
                            display = F))
## $`-1`
##        Estimate        SE       Lower     Upper
## 1:5%  0.1854204 0.1316218 -0.07255364 0.4433945
## 1:10% 0.1805588 0.1246436 -0.06373812 0.4248557
## 1:15% 0.1770742 0.1199609 -0.05804492 0.4121932
## 1:21% 0.1737531 0.1157066 -0.05302779 0.4005339
## 
## $`-0.75`
##        Estimate        SE     Lower     Upper
## 1:5%  0.5548033 0.1603098 0.2406019 0.8690047
## 1:10% 0.5499417 0.1532949 0.2494893 0.8503941
## 1:15% 0.5464571 0.1485831 0.2552394 0.8376747
## 1:21% 0.5431360 0.1442990 0.2603152 0.8259567
## 
## $`-0.5`
##        Estimate        SE     Lower    Upper
## 1:5%  0.9241862 0.1932594 0.5454048 1.302968
## 1:10% 0.9193246 0.1863806 0.5540254 1.284624
## 1:15% 0.9158400 0.1817648 0.5595875 1.272092
## 1:21% 0.9125189 0.1775713 0.5644855 1.260552
## 
## $`-0.25`
##       Estimate        SE     Lower    Upper
## 1:5%  1.293569 0.2286380 0.8454469 1.741691
## 1:10% 1.288708 0.2219286 0.8537354 1.723680
## 1:15% 1.285223 0.2174308 0.8590664 1.711379
## 1:21% 1.281902 0.2133475 0.8637483 1.700055
## 
## $`0`
##       Estimate        SE    Lower    Upper
## 1:5%  1.662952 0.2654765 1.142628 2.183276
## 1:10% 1.658090 0.2589271 1.150603 2.165578
## 1:15% 1.654606 0.2545391 1.155718 2.153493
## 1:21% 1.651285 0.2505575 1.160201 2.142368

Non-parametric (NPMLE) models

The quantile() method can also be used to make predictions from NPMLE fits. This method works by assuming that the events are evenly scattered within each inspection interval (‘interpolation method’). Inferences need to be explicitly requested by using setting ‘interval = T’; in this case, standard errors are estimated by using a resampling approach, that is performed at the group level, whenever a grouping variable is provided, by way of the ‘units’ argument (Davison and Hinkley, 1997).

mod2 <- drmte(nSeeds ~ timeBef + timeAf, fct = NPMLE(),
             curveid = Species, data = verbascum)
quantile(mod2, probs = c(0.3, 0.5), robust = T, units = Dish,
         interval = T, rate = T)
## 
## 
## 
## 
## Estimated quantiles
## (cluster robust bootstrap-based inference)
## 
##                 n       Mean   Median        SE   Lower   Upper
## arcturus.30%  999 0.04146680 0.067397 0.0367206 0.00000 0.08547
## arcturus.50%  999 0.00062217 0.000000 0.0065345 0.00000 0.00000
## blattaria.30% 999 0.26951972 0.268293 0.0175655 0.24096 0.30726
## blattaria.50% 999 0.23306179 0.230769 0.0127207 0.21393 0.26443
## creticum.30%  999 0.34503418 0.343750 0.0205478 0.31176 0.38462
## creticum.50%  999 0.30386151 0.302469 0.0131902 0.28255 0.33333

For KDE models, quantiles are calculated from the time-to-event curve by using a bisection method. However, we are still unsure about the most reliable way to obtain standard errors and, for this reason, inferences are not provided with this type of non-parametric models.

Quantiles and Effective Doses (ED)

Quantiles for time-to-event data resamble Effective Doses (ED) for dose-response data, although we discourage the use of this latter term, as the time-to-event curve is a cumulative probability function based on time, that is not a ‘dose’ in strict terms. However, the concept is similar: we need to find the stimulus (time) that permits to obtain a certain response (germination/emergence). Considering such similarity, we decided to define the ED() method for ‘drcte’ objects, that is compatible with the ED() method for ‘drc’ objects. However, for seed germination/emergence data, we strongly favor the use of the quantile() method.

It’s all for this topic; thank you for reading and, please, stay tuned for other posts in this series.

Prof. Andrea Onofri
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)
Send comments to: [email protected]


References

  1. Bradford KJ (2002) Applications of hydrothermal time to quantifying and modeling seed germination and dormancy. Weed Sci 50:248–260
  2. Davison, A.C., Hinkley, D.V., 1997. Bootstrap methods and their application. Cambridge University Press, UK.
  3. Keshtkar E, Kudsk P, Mesgaran MB (2021) Perspective: Common errors in dose–response analysis and how to avoid them. Pest Manag Sci 77:2599–2608
  4. Onofri, A., Mesgaran, M., & Ritz, C. (2022). A unified framework for the analysis of germination, emergence, and other time-to-event data in weed science. Weed Science, 1-13. doi:10.1017/wsc.2022.8
  5. Zeileis, A., Köll, S., Graham, N., 2020. Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in R. J. Stat. Soft. 95. https://doi.org/10.18637/jss.v095.i01
To leave a comment for the author, please follow the link and comment on their blog: R on The broken bridge between biologists and statisticians.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)