More on those stepped-wedge design assumptions: varying intra-cluster correlations over time
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In my last post, I wrote about within- and between-period intra-cluster correlations in the context of stepped-wedge cluster randomized study designs. These are quite important to understand when figuring out sample size requirements (and models for analysis, which I’ll be writing about soon.) Here, I’m extending the constant ICC assumption I presented last time around by introducing some complexity into the correlation structure. Much of the code I am using can be found in last week’s post, so if anything seems a little unclear, hop over here.
Different within- and between-period ICC’s
In a scenario with constant within- and between-period ICC’s, the correlated data can be induced using a single cluster-level effect like bc in this model:
Yict=μ+β0t+β1Xct+bc+eict
More complexity can be added if, instead of a single cluster level effect, we have a vector of correlated cluster/time specific effects bc. These cluster-specific random effects (bc1,bc2,…,bcT) replace bc, and the slightly modified data generating model is
Yict=μ+β0t+β1Xct+bct+eict
The vector bc has a multivariate normal distribution NT(0,σ2bR). This model assumes a common covariance structure across all clusters, σ2bR, where the general version of R is
R=(1r12r13⋯r1Tr211r23⋯r2Tr31r321⋯r3T⋮⋮⋮⋮⋮rT1rT2rT3⋯1)
Within-period cluster correlation
The covariance of any two individuals i and j in the same cluster c and same period t is
cov(Yict,Yjct)=cor(μ+β0t+β1Xct+bct+eict, μ+β0t+β1Xct+bct+ejct)=cov(bct,bct)+cov(eict,ejct)=var(bct)+0=σ2brtt=σ2bsince rtt=1, ∀t∈ (1,…,T)
And I showed in the previous post that var(Yict)=var(Yjct)=σ2b+σ2e, so the within-period intra-cluster correlation is what we saw last time:
ICCtt=σ2bσ2b+σ2e
Between-period cluster correlation
The covariance of any two individuals in the same cluster but two different time periods t and t′ is:
cov(Yict,Yjct′)=cor(μ+β0t+β1Xct+bct+eict, μ+β0t+β1Xct′+bct′+ejct′)=cov(bct,bct′)+cov(eict,ejct′)=σ2brtt′
Based on this, the between-period intra-cluster correlation is
ICCtt′=σ2bσ2b+σ2ertt′
Adding structure to matrix R
This paper by Kasza et al, which describes various stepped-wedge models, suggests a structured variation of R that is a function of two parameters, r0 and r:
R=R(r0,r)=(1r0rr0r2⋯r0rT−1r0r1r0r⋯r0rT−2r0r2r0r1⋯r0rT−3⋮⋮⋮⋮⋮r0rT−1r0rT−2r0rT−3⋯1)
How we specify r0 and r reflects different assumptions about the between-period intra-cluster correlations. I describe two particular cases below.
Constant correlation over time
In this first case, the correlation between individuals in the same cluster but different time periods is less than the correlation between individuals in the same cluster and same time period. In other words, ICCtt≠ICCtt′. However the between-period correlation is constant, or ICCtt′ are constant for all t and t′. We have these correlations when r0=ρ and r=1, giving
R=R(ρ,1)=(1ρρ⋯ρρ1ρ⋯ρρρ1⋯ρ⋮⋮⋮⋮⋮ρρρ⋯1)
To simulate under this scenario, I am setting σ2b=0.15, σ2e=2.0, and ρ=0.6. We would expect the following ICC’s:
ICCtt=0.150.15+2.00=0.0698ICCtt′=0.150.15+2.00×0.6=0.0419
Here is the code to define and generate the data:
defc <- defData(varname = "mu", formula = 0, dist = "nonrandom", id = "cluster") defc <- defData(defc, "s2", formula = 0.15, dist = "nonrandom") defa <- defDataAdd(varname = "Y", formula = "0 + 0.10 * period + 1 * rx + cteffect", variance = 2, dist = "normal") dc <- genData(100, defc) dp <- addPeriods(dc, 7, "cluster") dp <- trtStepWedge(dp, "cluster", nWaves = 4, lenWaves = 1, startPer = 2) dp <- addCorGen(dtOld = dp, nvars = 7, idvar = "cluster", rho = 0.6, corstr = "cs", dist = "normal", param1 = "mu", param2 = "s2", cnames = "cteffect") dd <- genCluster(dp, cLevelVar = "timeID", numIndsVar = 100, level1ID = "id") dd <- addColumns(defa, dd)
As I did in my previous post, I’ve generated 200 data sets, estimated the within- and between-period ICC’s for each data set, and computed the average for each. The plot below shows the expected values in gray and the estimated values in purple and green.
Declining correlation over time
In this second case, we make an assumption that the correlation between individuals in the same cluster degrades over time. Here, the correlation between two individuals in adjacent time periods is stronger than the correlation between individuals in periods further apart. That is ICCtt′>ICCtt′′ if |t′−t|<|t′′−t|. This structure can be created by setting r0=1 and r=ρ, giving us an auto-regressive correlation matrix R:
R=R(1,ρ)=(1ρρ2⋯ρT−1ρ1ρ⋯ρT−2ρ2ρ1⋯ρT−3⋮⋮⋮⋮⋮ρT−1ρT−2ρT−3⋯1)
I’ve generated data using the same variance assumptions as above. The only difference in this case is that the corstr
argument in the call to addCorGen
is “ar1” rather than “cs” (which was used above). Here are a few of the expected correlations:
ICCt,t=0.150.15+2.00=0.0698ICCt,t+1=0.150.15+2.00×0.61=0.0419ICCt,t+2=0.150.15+2.00×0.62=0.0251⋮ICCt,t+6=0.150.15+2.00×0.66=0.0032
And here is the code:
defc <- defData(varname = "mu", formula = 0, dist = "nonrandom", id = "cluster") defc <- defData(defc, "s2", formula = 0.15, dist = "nonrandom") defa <- defDataAdd(varname = "Y", formula = "0 + 0.10 * period + 1 * rx + cteffect", variance = 2, dist = "normal") dc <- genData(100, defc) dp <- addPeriods(dc, 7, "cluster") dp <- trtStepWedge(dp, "cluster", nWaves = 4, lenWaves = 1, startPer = 2) dp <- addCorGen(dtOld = dp, nvars = 7, idvar = "cluster", rho = 0.6, corstr = "ar1", dist = "normal", param1 = "mu", param2 = "s2", cnames = "cteffect") dd <- genCluster(dp, cLevelVar = "timeID", numIndsVar = 10, level1ID = "id") dd <- addColumns(defa, dd)
And here are the observed average estimates (based on 200 datasets) alongside the expected values:
Random slope
In this last case, I am exploring what the ICC’s look like in the context of random effects model that includes a cluster-specific intercept bc and a cluster-specific slope sc:
Yict=μ+β0t+β1Xct+bc+sct+eict
Both bc and sc are normally distributed with mean 0, and variances σ2b and σ2s, respectively. (In this example σ2b and σ2s are uncorrelated, but that may not necessarily be the case.)
Because of the random slopes, the variance of the Y’s increase over time:
var(Yict)=σ2b+t2σ2s+σ2e
The same is true for the within- and between-period covariances:
cov(Yict,Yjct)=σ2b+t2σ2scov(Yict,Yjct′)=σ2b+tt′σ2s
The ICC’s that follow from these various variances and covariances are:
ITTtt=σ2b+t2σ2sσ2b+t2σ2s+σ2eITTtt′=σ2b+tt′σ2s[(σ2b+t2σ2s+σ2e)(σ2b+t′2σ2s+σ2e)]12
In this example, σ2s=0.01 (and the other variances remain as before), so
ITT33=0.15+32×0.010.15+32×0.01+2=0.1071
ITT36=0.15+3×6×0.01[(0.15+32×0.01+2)(0.15+62×0.01+2)]12=0.1392
Here’s the data generation:
defc <- defData(varname = "ceffect", formula = 0, variance = 0.15, dist = "normal", id = "cluster") defc <- defData(defc, "cteffect", formula = 0, variance = 0.01, dist = "normal") defa <- defDataAdd(varname = "Y", formula = "0 + ceffect + 0.10 * period + cteffect * period + 1 * rx", variance = 2, dist = "normal") dc <- genData(100, defc) dp <- addPeriods(dc, 7, "cluster") dp <- trtStepWedge(dp, "cluster", nWaves = 4, lenWaves = 1, startPer = 2) dd <- genCluster(dp, cLevelVar = "timeID", numIndsVar = 10, level1ID = "id") dd <- addColumns(defa, dd)
And here is the comparison between observed and expected ICC’s. The estimates are quite variable, so there appears to be slight bias. However, if I generated more than 200 data sets, the mean would likely converge closer to the expected values.
In the next post (or two), I plan on providing some examples of fitting models to the data I’ve generated here. In some cases, fairly standard linear mixed effects models in R
may be adequate, but in others, we may need to look elsewhere.
References:
Kasza, J., K. Hemming, R. Hooper, J. N. S. Matthews, and A. B. Forbes. “Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomised trials.” Statistical methods in medical research (2017): 0962280217734981.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.