On Cochran Theorem (and Orthogonal Projections)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Cochran Theorem – from The distribution of quadratic forms in a normal system, with applications to the analysis of covariance published in 1934 – is probably the most import one in a regression course. It is an application of a nice result on quadratic forms of Gaussian vectors. More precisely, we can prove that if Y∼N(0,Id) is a random vector with d N(0,1) variable then (i) if A is a (squared) idempotent matrix Y⊤AY∼χ2r where r is the rank of matrix A, and (ii) conversely, if Y⊤AY∼χ2r then A is an idempotent matrix of rank r. And just in case, A is an idempotent matrix means that A2=A, and a lot of results can be derived (for instance on the eigenvalues). The prof of that result (at least the (i) part) is nice: we diagonlize matrix A, so that A=PΔP⊤, with P orthonormal. Since A is an idempotent matrix observe thatA2=PΔP⊤=PΔP⊤=PΔ2P⊤where Δ is some diagonal matrix such that Δ2=Δ, so terms on the diagonal of Δ are either 0 or 1‘s. And because the rank of A (and Δ) is r then there should be r 1‘s and d−r 1‘s. Now writeY⊤AY=Y⊤PΔP⊤Y=Z⊤ΔZwhere Z=P⊤Y that satisfiesZ∼N(0,PP⊤) i.e. Z∼N(0,Id). Thus Z⊤ΔZ=∑i:Δi,i−1Z2i∼χ2rNice, isn’t it. And there is more (that will be strongly connected actually to Cochran theorem). Let A=A1+⋯+Ak, then the two following statements are equivalent (i) A is idempotent and rank(A)=rank(A1)+⋯+rank(Ak) (ii) Ai‘s are idempotents, AiAj=0 for all i≠j.
Now, let us talk about projections. Let y be a vector in Rn. Its projection on the space V(v1,…,vp) (generated by those p vectors) is the vector ˆy=Vˆa that minimizes ‖y−Va‖ (in a). The solution isˆa=(V⊤V)−1V⊤y and ˆy=Vˆa
Matrix P=V(V⊤V)−1V⊤ is the orthogonal projection on {v1,…,vp} and ˆy=Py.
Now we can recall Cochran theorem. Let Y∼N(μ,σ2Id) for some σ>0 and μ. Consider sub-vector orthogonal spaces F1,…,Fm, with dimension di. Let PFi be the orthogonal projection matrix on Fi, then (i) vectors PF1X,…,PFmX are independent, with respective distribution N(PFiμ,σ2Idi) and (ii) random variables ‖PFi(X−μ)‖2/σ2 are independent and χ2di distributed.
We can try to visualize those results. For instance, the orthogonal projection of a random vector has a Gaussian distribution. Consider a two-dimensional Gaussian vector
library(mnormt)
r = .7
s1 = 1
s2 = 1
Sig = matrix(c(s1^2,r*s1*s2,r*s1*s2,s2^2),2,2)
Sig
Y = rmnorm(n = 1000,mean=c(0,0),varcov = Sig)
plot(Y,cex=.6)
vu = seq(-4,4,length=101)
vz = outer(vu,vu,function (x,y) dmnorm(cbind(x,y),
mean=c(0,0), varcov = Sig))
contour(vu,vu,vz,add=TRUE,col='blue')
abline(a=0,b=2,col="red")
Consider now the projection of points y=(y1,y2) on the straight linear with directional vector →u with slope a (say a=2). To get the projected point x=(x1,x2) recall that x2=ay1 and →x,y⊥→u. Hence, the following code will give us the orthogonal projections
p = function(a){
x0=(Y[,1]+a*Y[,2])/(1+a^2)
y0=a*x0
cbind(x0,y0)
}
with
P = p(2)
for(i in 1:20) segments(Y[i,1],Y[i,2],P[i,1],P[i,2],lwd=4,col="red")
points(P[,1],P[,2],col="red",cex=.7)
Now, if we look at the distribution of points on that line, we get… a Gaussian distribution, as expected,
z = sqrt(P[,1]^2+P[,2]^2)*c(-1,+1)[(P[,1]>0)*1+1]
vu = seq(-6,6,length=601)
vv = dnorm(vu,mean(z),sd(z))
hist(z,probability = TRUE,breaks = seq(-4,4,by=.25))
lines(vu,vv,col="red")
Or course, we can use the matrix representation to get the projection on →u, or a normalized version of that vector actually
a=2
U = c(1,a)/sqrt(a^2+1)
U
[1] 0.4472136 0.8944272
matP = U %*% solve(t(U) %*% U) %*% t(U)
matP %*% Y[1,]
[,1]
[1,] -0.1120555
[2,] -0.2241110
P[1,]
x0 y0
-0.1120555 -0.2241110
(which is consistent with our manual computation). Now, in Cochran theorem, we start with independent random variables,
Y = rmnorm(n = 1000,mean=c(0,0),varcov = diag(c(1,1)))
Then we consider the projection on →u and →v=→u⊥
U = c(1,a)/sqrt(a^2+1)
matP1 = U %*% solve(t(U) %*% U) %*% t(U)
P1 = Y %*% matP1
z1 = sqrt(P1[,1]^2+P1[,2]^2)*c(-1,+1)[(P1[,1]>0)*1+1]
V = c(a,-1)/sqrt(a^2+1)
matP2 = V %*% solve(t(V) %*% V) %*% t(V)
P2 = Y %*% matP2
z2 = sqrt(P2[,1]^2+P2[,2]^2)*c(-1,+1)[(P2[,1]>0)*1+1]
We can plot those two projections
plot(z1,z2)
and observe that the two are indeed, independent Gaussian variables. And (of course) there squared norms are χ21 distributed.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.