Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last week, a student asked me about multiple tests. More precisely, she ran an experience over – say – 20 weeks, with the same cohort of – say – 100 patients. An we observe some
size=100 nb=20 set.seed(1) X=matrix(rnorm(size*nb),size,nb)
(here, I just generate some fake data). I can visualize some trajectories, over the 20 weeks,
library(RColorBrewer) cl1=brewer.pal(12,"Set3")[12] cl2=brewer.pal(8,"Set2")[2:7] cl=c(cl1,cl2) boxplot(X) for(i in 1:length(cl)){ lines(1:20,X[i,],type="b",col=cl[i]) }
She wanted to compare the averages, over the experiments.
boxplot(X) for(i in 1:length(cl)){ lines(1:20,X[i,],type="b",col=cl[i]) }
The question is what do you want to test ? Is it
versus
Because that one might be not very informative (if we reject
> P=NULL > for(i in 1:(nb-1)){ for(j in (i+1):nb){ + P=c(P,t.test(X[,i],X[,j])$p.value) }} > mean(P>.05) [1] 0.9368421
i.e. for 12 pairs, we reject the assumption of identical mean. Even if that assumption is true, here. That’s the problem with multiple tests.
Actually, observe that over those 190 tests, the
Now, what if we relax the assumption of independence ? Here, we need to be more specific. A first idea might be to assume that
for some trend
Standard regression techniques can be considered, under those models. We’ll probably discuss that point before the end of this year.
My point was the following : if the assume that
consist of independent observations, where all
A first idea could be to consider a linear trend,
X=matrix(rnorm(size*nb),size,nb) T=t(matrix((1:nb)/nb*r,nb,size)) X=X+T*a
Here, we still compute pairwise comparison tests
> P=NULL > for(i in 1:(nb-1)){ for(j in (i+1):nb){ + P=c(P,t.test(X[,i],X[,j])$p.value) }}
and we counts the number of pairs where the
PV[1]=mean(P>.025) PV[2]=mean(P>.05) PV[3]=mean(P>.075) PV[4]=mean(P>.1) PV[5]=mean(P>.125) PV[6]=mean(P>.15)
If we look at the proportions, as a function of the slope, we have the following graph
In the middle, we have the independent case, e.g. 97.5% of the the
One can also assume that
is some time series. An autoregressive one, for instance,
Here we use to following code to generate that vector
autoreg=function(n,r){ M=matrix(1,n,n) for(i in 1:(n-2)){ diag(M[(i+1):n,1:(n-i)])=r^i diag(M[1:(n-i),(i+1):n])=r^i } M[1,n]=M[n,1]=r^(i+1) return(M) } S=autoreg(nb,r) library(mnormt) X=rmnorm(size,varcov=S)
The graph below is the evolution of the proportion of
The value on the left is the independent case (and
Finally, consider the case where components of vector
are exchangeable.
S=matrix(r,nb,nb) diag(S)=1 library(mnormt) X=rmnorm(size,M,varcov=S)
In that case, we have the following graph, with the proportion of
This is the idea that is used in any correction-type method, with multiple tests, for instance Bonferroni correction. When a run
Next time, I will discuss my student’s data, see what could be said.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.