More on those stepped-wedge design assumptions: varying intra-cluster correlations over time

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my last post, I wrote about within- and between-period intra-cluster correlations in the context of stepped-wedge cluster randomized study designs. These are quite important to understand when figuring out sample size requirements (and models for analysis, which I’ll be writing about soon.) Here, I’m extending the constant ICC assumption I presented last time around by introducing some complexity into the correlation structure. Much of the code I am using can be found in last week’s post, so if anything seems a little unclear, hop over here.

Different within- and between-period ICC’s

In a scenario with constant within- and between-period ICC’s, the correlated data can be induced using a single cluster-level effect like bc in this model:

Yict=μ+β0t+β1Xct+bc+eict

More complexity can be added if, instead of a single cluster level effect, we have a vector of correlated cluster/time specific effects bc. These cluster-specific random effects (bc1,bc2,,bcT) replace bc, and the slightly modified data generating model is

Yict=μ+β0t+β1Xct+bct+eict

The vector bc has a multivariate normal distribution NT(0,σ2bR). This model assumes a common covariance structure across all clusters, σ2bR, where the general version of R is

R=(1r12r13r1Tr211r23r2Tr31r321r3TrT1rT2rT31)

Within-period cluster correlation

The covariance of any two individuals i and j in the same cluster c and same period t is

cov(Yict,Yjct)=cor(μ+β0t+β1Xct+bct+eict, μ+β0t+β1Xct+bct+ejct)=cov(bct,bct)+cov(eict,ejct)=var(bct)+0=σ2brtt=σ2bsince rtt=1, t (1,,T)

And I showed in the previous post that var(Yict)=var(Yjct)=σ2b+σ2e, so the within-period intra-cluster correlation is what we saw last time:

ICCtt=σ2bσ2b+σ2e

Between-period cluster correlation

The covariance of any two individuals in the same cluster but two different time periods t and t is:

cov(Yict,Yjct)=cor(μ+β0t+β1Xct+bct+eict, μ+β0t+β1Xct+bct+ejct)=cov(bct,bct)+cov(eict,ejct)=σ2brtt

Based on this, the between-period intra-cluster correlation is

ICCtt=σ2bσ2b+σ2ertt

Adding structure to matrix R

This paper by Kasza et al, which describes various stepped-wedge models, suggests a structured variation of R that is a function of two parameters, r0 and r:

R=R(r0,r)=(1r0rr0r2r0rT1r0r1r0rr0rT2r0r2r0r1r0rT3r0rT1r0rT2r0rT31)

How we specify r0 and r reflects different assumptions about the between-period intra-cluster correlations. I describe two particular cases below.

Constant correlation over time

In this first case, the correlation between individuals in the same cluster but different time periods is less than the correlation between individuals in the same cluster and same time period. In other words, ICCttICCtt. However the between-period correlation is constant, or ICCtt are constant for all t and t. We have these correlations when r0=ρ and r=1, giving

R=R(ρ,1)=(1ρρρρ1ρρρρ1ρρρρ1)

To simulate under this scenario, I am setting σ2b=0.15, σ2e=2.0, and ρ=0.6. We would expect the following ICC’s:

ICCtt=0.150.15+2.00=0.0698ICCtt=0.150.15+2.00×0.6=0.0419

Here is the code to define and generate the data:

defc <- defData(varname = "mu", formula = 0, 
                dist = "nonrandom", id = "cluster")
defc <- defData(defc, "s2", formula = 0.15, dist = "nonrandom")

defa <- defDataAdd(varname = "Y", 
                   formula = "0 + 0.10  * period + 1 * rx + cteffect", 
                   variance = 2, dist = "normal")

dc <- genData(100, defc)
dp <- addPeriods(dc, 7, "cluster")
dp <- trtStepWedge(dp, "cluster", nWaves = 4, lenWaves = 1, startPer = 2)
dp <- addCorGen(dtOld = dp, nvars = 7, idvar = "cluster", 
                rho = 0.6, corstr = "cs", dist = "normal", 
                param1 = "mu", param2 = "s2", cnames = "cteffect")
  
dd <- genCluster(dp, cLevelVar = "timeID", numIndsVar = 100, 
                 level1ID = "id")
dd <- addColumns(defa, dd)

As I did in my previous post, I’ve generated 200 data sets, estimated the within- and between-period ICC’s for each data set, and computed the average for each. The plot below shows the expected values in gray and the estimated values in purple and green.

Declining correlation over time

In this second case, we make an assumption that the correlation between individuals in the same cluster degrades over time. Here, the correlation between two individuals in adjacent time periods is stronger than the correlation between individuals in periods further apart. That is ICCtt>ICCtt if |tt|<|tt|. This structure can be created by setting r0=1 and r=ρ, giving us an auto-regressive correlation matrix R:

R=R(1,ρ)=(1ρρ2ρT1ρ1ρρT2ρ2ρ1ρT3ρT1ρT2ρT31)

I’ve generated data using the same variance assumptions as above. The only difference in this case is that the corstr argument in the call to addCorGen is “ar1” rather than “cs” (which was used above). Here are a few of the expected correlations:

ICCt,t=0.150.15+2.00=0.0698ICCt,t+1=0.150.15+2.00×0.61=0.0419ICCt,t+2=0.150.15+2.00×0.62=0.0251ICCt,t+6=0.150.15+2.00×0.66=0.0032

And here is the code:

defc <- defData(varname = "mu", formula = 0, 
                dist = "nonrandom", id = "cluster")
defc <- defData(defc, "s2", formula = 0.15, dist = "nonrandom")

defa <- defDataAdd(varname = "Y", 
                   formula = "0 + 0.10  * period + 1 * rx + cteffect", 
                   variance = 2, dist = "normal")

dc <- genData(100, defc)
dp <- addPeriods(dc, 7, "cluster")
dp <- trtStepWedge(dp, "cluster", nWaves = 4, lenWaves = 1, startPer = 2)
dp <- addCorGen(dtOld = dp, nvars = 7, idvar = "cluster", 
                rho = 0.6, corstr = "ar1", dist = "normal", 
                param1 = "mu", param2 = "s2", cnames = "cteffect")
  
dd <- genCluster(dp, cLevelVar = "timeID", numIndsVar = 10, 
                 level1ID = "id")
dd <- addColumns(defa, dd)

And here are the observed average estimates (based on 200 datasets) alongside the expected values:

Random slope

In this last case, I am exploring what the ICC’s look like in the context of random effects model that includes a cluster-specific intercept bc and a cluster-specific slope sc:

Yict=μ+β0t+β1Xct+bc+sct+eict

Both bc and sc are normally distributed with mean 0, and variances σ2b and σ2s, respectively. (In this example σ2b and σ2s are uncorrelated, but that may not necessarily be the case.)

Because of the random slopes, the variance of the Y’s increase over time:

var(Yict)=σ2b+t2σ2s+σ2e

The same is true for the within- and between-period covariances:

cov(Yict,Yjct)=σ2b+t2σ2scov(Yict,Yjct)=σ2b+ttσ2s

The ICC’s that follow from these various variances and covariances are:

ITTtt=σ2b+t2σ2sσ2b+t2σ2s+σ2eITTtt=σ2b+ttσ2s[(σ2b+t2σ2s+σ2e)(σ2b+t2σ2s+σ2e)]12

In this example, σ2s=0.01 (and the other variances remain as before), so

ITT33=0.15+32×0.010.15+32×0.01+2=0.1071

and

ITT36=0.15+3×6×0.01[(0.15+32×0.01+2)(0.15+62×0.01+2)]12=0.1392

Here’s the data generation:

defc <- defData(varname = "ceffect", formula = 0, variance = 0.15, 
                dist = "normal", id = "cluster")
defc <- defData(defc, "cteffect", formula = 0, variance = 0.01, 
                dist = "normal")

defa <- defDataAdd(varname = "Y", 
  formula = "0 + ceffect + 0.10  * period + cteffect * period + 1 * rx", 
  variance = 2, dist = "normal")

dc <- genData(100, defc)
dp <- addPeriods(dc, 7, "cluster")
dp <- trtStepWedge(dp, "cluster", nWaves = 4, lenWaves = 1, startPer = 2)
  
dd <- genCluster(dp, cLevelVar = "timeID", numIndsVar = 10, 
                 level1ID = "id")
dd <- addColumns(defa, dd)

And here is the comparison between observed and expected ICC’s. The estimates are quite variable, so there appears to be slight bias. However, if I generated more than 200 data sets, the mean would likely converge closer to the expected values.

In the next post (or two), I plan on providing some examples of fitting models to the data I’ve generated here. In some cases, fairly standard linear mixed effects models in R may be adequate, but in others, we may need to look elsewhere.

References:

Kasza, J., K. Hemming, R. Hooper, J. N. S. Matthews, and A. B. Forbes. “Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomised trials.” Statistical methods in medical research (2017): 0962280217734981.

To leave a comment for the author, please follow the link and comment on their blog: ouR data generation.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)