MAT8886 reducing dimension using factors
[This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
First, let us recall a standard result from linear algebra: “real symmetric matrices are diagonalizable by orthogonal matrices“. Thus, any variance-covariance matrix can be writtenWant to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the context of Gaussian random vectors (or more generally elliptical distributions), we can write
The idea is to write the expression above
This technique is extremely popular in finance, to model returns of multiple stocks, from the capital asset pricing model (CAPM, Sharpe (1964) or Mossin (1966)) – with one factor (the so-called market) – to the arbitrage pricing theory (APT, Ross (1976)). For instance, with the following code, we can extract prices of 35 French stocks,
code=read.table( "http://perso.univ-rennes1.fr/arthur.charpentier/ code-CAC.csv",sep=";",header=TRUE) code$Nom=as.character(code$Nom) code$Code=as.character(code$Code) head(code) i=1 library(tseries) code=code[-8,] X<-get.hist.quote(code$Code[i]) Xc=X$Close for(i in 2:nrow(code)){ x<-get.hist.quote(code$Code[i]) xc=x$Close Xc=merge(Xc,xc)}It is natural to consider log-returns, and their correlations,
R=diff(log(Xc)) colnames(R)=code$Code correlation=matrix(NA,ncol(R),ncol(R)) colnames(correlation)=code$Code rownames(correlation)=code$Code for(i in 1:ncol(R)){ for(j in 1: ncol(R)){ I=(is.na(R[,i])==FALSE)&(is.na(R[,j])==FALSE) correlation[i,j]=cor(R[I,i],R[I,j]); }} library(corrgram) corrgram(correlation, order=NULL, lower.panel=panel.shade, upper.panel=NULL, text.panel=panel.txt, main="")In that case, there is one eigenvalue extremely large, and then, tall the others are extremely small,
L=eigen(correlation) plot(1:ncol(R),L$values,type="b",col="red")
I.e. we suggest to consider a factor model, with equals one.
In a Gaussian (or elliptical) world, building factor models are extremely close to the theory of principal component analysis, where we seek axis, or planes, with the "best" projection of scatterplots,To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.