Repeated measures and subsampling with perennial crops

[This article was first published on R on The broken bridge between biologists and statisticians, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a recent post, I have talked about repeated measures, for a case where measurements were taken repeatedly in the same plots across years see here. Previously, in another post, I had talked about subsampling, for a case where several random samples were taken from the same plot see here.

Repeated measures and subsampling are vastly different: in the first case I am specifically interested in the ‘evolution’ of the response over time (or space, sometimes). In the second case (subsampling), I only want to improve the precision/accuracy of my measurements, by taking multiple random samples in each plot.

In this post, I decided to consider a situation where we have both repeated measures and subsampling. I have been prompted to work on this post by a colleague, that is struggling to find the appropriate model for such a situation; I hope that I can lend him/her a hand to get over this (thanks for your request, Gal!).

Let’s consider the dataset below, that refers to the yield of lucerne genotypes in three consecutive years, taken from the same plots in a single experiment lasting for three years. The situation is similar to that of my previous post, but, in this case, in each year, we took two subsamples of 1 m2 from each plot to obtain two separate yield measurements.

rm(list = ls())
library(nlme)
library(dplyr)
filePath <- "https://www.casaonofri.it/_datasets/alfalfa3years_subs.csv"
dataset <- read.csv(filePath)
dataset[,1:4] <- lapply(dataset[,1:4],
                        function(col) factor(col))
head(dataset)
##   Block Genotype Year SubSample     Yield
## 1     1 4cascine 2006         1  5.967228
## 2     1 4cascine 2006         2  5.021836
## 3     1 4cascine 2007         1 11.319587
## 4     1 4cascine 2007         2 11.917985
## 5     1 4cascine 2008         1 10.534574
## 6     1 4cascine 2008         2 10.374413

For this dataset, we need to consider that the observations taken in different years come from the same plots (repeated measures), so that there are three ‘subplots in time’. In each of these subplots (i.e., in each year), we take two subsamples. By considering this, we can follow the usual rules to build the model (Piepho et al., 2004):

  1. Consider one single year and build the treatment model
  2. Consider one single year and build the block model
  3. Consider that the ‘plot’ factor in the block model references the randomisation units, i.e. those units which received the the genotypes by a randomisation process. Assign to this plot factor a random effect.
  4. Include the year factor into the model and combine the year with all the effects in the treatment model, by crossing or nesting as appropriate.
  5. Excluding the terms for randomisation units, nest the year in all the other terms in the block model.
  6. Combine random effects for randomisation units with the repeated factor, by using the colon operator, in order to derive the correct error terms to accommodate correlation structures.
  7. Introduce subsamples, nested within plots and within years. These subsamples must receive a random effect, because they were randomly sampled.

The models at the different steps are as follows (with R notation):

  1. treatment model: YIELD ~ GENOTYPE
  2. block model: YIELD ~ BLOCK + BLOCK:PLOT
  3. block model: YIELD ~ BLOCK + (1|BLOCK:PLOT)
  4. treatment model: YIELD ~ GENOTYPE * YEAR
  5. block model: YIELD ~ BLOCK + BLOCK:YEAR + (1|BLOCK:PLOT)
  6. block model: YIELD ~ BLOCK + BLOCK:YEAR + (1|BLOCK:PLOT) + (1|BLOCK:PLOT:YEAR)
  7. block model: YIELD ~ BLOCK + BLOCK:YEAR + (1|BLOCK:PLOT) + (1|BLOCK:PLOT:YEAR) + (1|BLOCK:PLOT:YEAR:SUBSAMPLES)

In this case, we take the block, year and genotype effects as fixed (but you can change this as you please), so that the final model is:

YIELD ~ GENOTYPE * YEAR + BLOCK + BLOCK:YEAR + (1|BLOCK:PLOT) + (1|BLOCK:PLOT:YEAR) + (1|BLOCK:PLOT:YEAR:SUBSAMPLE)

where the last term does not need to be fitted in R, as it is the residual term, that is fitted by default.

We start the analysis by building a new variable to uniquely identify each plot; it is easy, if we think that yield values taken for the same genotype in the same block must have been taken in the same plot. Although this is not strictly necessary, we build another variable to uniquely identify, in each plot, the three ‘subplots in time’, within which subsamples were taken (this facilitates the fitting process in R).

# Reference the plots
dataset$Plot <- dataset$Block:dataset$Genotype
dataset$Subplot <- dataset$Plot:dataset$Year

Now we can fit the model with the lme() function (we also show the fit with the lmer() function, just in case):

mod <- lme(Yield ~ Block + Block:Year + Genotype*Year, 
           random = ~1|Plot/Subplot,
           data = dataset)
anova(mod)
##               numDF denDF  F-value p-value
## (Intercept)       1   240 7374.988  <.0001
## Block             3    57    0.241  0.8676
## Genotype         19    57    2.361  0.0066
## Year              2   114 3124.566  <.0001
## Block:Year        6   114    7.192  <.0001
## Year:Genotype    38   114    2.524  0.0001
# library(lme4)
# mod1 <- lmer(Yield ~ Genotype*Year + Block + Block:Year
#              + (1|Plot) + (1|Subplot), 
#              data = dataset)

As we said in our previous post, the above mixed model analyses regards the design as a sort of ‘split-plot in time’ with sub-replicates and it is not necessarily correct, as it assumes that the within-plot correlation is the same for all pairs of observations, regardless of their distance in time. Further analyses might be necessary to assess whether serial correlation structures are necessary.

Please, also note that the above model works equally well with annual crops, if we repeat the experiment in several years by using exactly the same plots.

Thanks for reading and happy coding!

Andrea Onofri
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)
[email protected]


Reference

  1. Piepho, H.-P., Büchse, A., Richter, C., 2004. A Mixed Modelling Approach for Randomized Experiments with Repeated Measures. Journal of Agronomy and Crop Science 190, 230–247.
To leave a comment for the author, please follow the link and comment on their blog: R on The broken bridge between biologists and statisticians.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)