Data Analysis Lectures
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is the 1st theoretical post I’m writing on this blog. I’ll review some important Linear Algebra concepts that I’m using at work and these will save you a lot of time if you work with large datasets as I do.
- Diagonalization
- Cayley-Hamilton Theorem
- Time saving steps in R
Diagonalization
Let \(\renewcommand{\vec}[1]{\boldsymbol{#1}} \newcommand{\R}{\mathbb{R}} A=
In general,
If \(A\) can be displayed in a useful factorization of the form \(A = PDP^{-1}\), with \(D\) a diagonal matrix (\(d_{ij} = 0\) if \(i\neq j\)) and \(P\) invertible, then is easy to obtain \(A^k\).
Remember than an eigenvector of \(A\) is a vector \(\vec{v}\in \R^n\) such that \(A\vec{v} = \lambda \vec{v}\) with \(\lambda\) being an eigenvalue of \(A\).
Theorem (Diagonalization Theorem). Let \(A\) an \(n\times n\) matrix. \(A\) is diagonalizable if and only if \(A\) has \(n\) linearly independent eigenvectors. In fact, \(A=PDP^{-1}\), with \(D\) a diagonal matrix and \(P\) an invertible matrix, if and only if the columns of \(P\) are \(n\) linearly independent vectors of \(A\). In this case, the diagonal entries of \(D\) are eigenvalues of \(A\) that correspond to the eigenvectors in \(P\).
Proof. Let \(P \in \R^{n\times n}\) being a matrix of eigenvectors of \(A\) and \(D\) being any diagonal matrix of eigenvalues, then
and
Suppose that \(A\) is diagonalizable, then \(A = PDP^{-1}\) and by multiplying by \(P\) I obtain \(AP = PD\). By replacing the obtained values of \(AP\) and \(PD\) y obtain
Given that \(P\) is invertible, its columns \(\vec{v_1},\ldots,\vec{v_n}\) must be linearly independent. Those columns are also different from \(\vec{0}\), so \(\lambda_1,\ldots,\lambda_n\) are eigenvalues and \(\vec{v_1}, \ldots, \vec{v_n}\) are eigenvectors.
Given any \(n\) eigenvectors of \(A\), let \(P = [\vec{v_1} \: \vec{v_2} \: \ldots \vec{v_n}]\) and \(d_{ij} = \lambda_i\) for \(i = j\) and \(d_{ij} = 0\) in other case. Then \(AP = PD\) is true without imposing conditions on the eigenvectors. If the eigenvectors are linearly independent, then \(P\) is invertible and \(AP = PD\) implies \(A = PDP^{-1}\).
Cayley-Hamilton Theorem
Statement and Proof
Let \(A\in \R^{n\times n}\) the characteristic polynomial of \(A\) is defined as
Theorem (Cayley-Hamilton, 1858). Every square matrix is a root of its own characteristic polynomial.
Proof. Let \(A\in \R^{n\times n}\). The characteristic polynomial of \(A\) is a polinomial of \(\lambda\), its degree is \(n\), and can be written as
Consider the adjoint matrix of \((\lambda I – A)\). By definition of the adjoint, this will be a matrix whose components are monic polinomials of \(\lambda\) and its highest degree is \(n-1\).
Hence, it is possible to write the adjoint as
Where \(B_j \in \R^{n\times n}\) and each \(B_j\) is a constant matrix.
By using adjoint properties I have
By replacing this equation \((*)\) and \((**)\) I obtain
Hence,
By quating similar terms
By summing these last equations I obtain
which concludes the proof.
Application: Obtain the inverse of a matrix
Cayley-Hamilton theorem can be used, among other things, to obtain the inverse of a matrix. To do that I propose three steps.
1st step. Obtain the characteristic polinomial of \(A\) that is defined as
2nd step. Equate the characteristic polinomial to zero and replace \(\lambda\) by \(A\) and attach \(I\) to the terms where there is no replacement. Example:
3rd step. Solve for \(I\) and remember that \(AA^{-1}=I\). Continuing the last example:
Examples
2×2 matrix
Consider the matrix
To obtain the inverse of \(A\) these are the steps:
1st step. Obtain the characteristic polinomial of \(A\) and equate to zero
2nd step. Take the last result and transform by replacing \(\lambda\) by \(A\)
3rd step. Factorize by \(A\) in the equation \(A^2-5A-2I = 0\) and solve for \(I\)
As \(AA^{-1}=I\) I obtain \(A^{-1} = \frac{1}{2} (A-5I)\) and then
4×4 matrix
Here the method proves to be useful.
Consider the matrix
Following the last steps I obtain
By using \(\displaystyle(x+y)^n = \sum_{k=0}^n \frac{n!}{(n-k)!k!} \vec{x}^{n-k}\vec{y}^k\) I obtain
By doing the replacements I obtain
and then \(A^{-1}=-A^3+4A^2-6A+4I\).
Now I obtain the matrix that belongs to the obtained polinomial
And by replacing the obtained matrix I replace in the obtained polinomial and I have the inverse of \(A\)
Time saving steps in R
A <- rbind(c(5,0,0,0),c(0,5,0,0),c(1,4,-3,0),c(-1,2,0,-3)) P <- eigen(A)$vectors D <- diag(eigen(A)$values) ptm <- proc.time() A%*%A ## [,1] [,2] [,3] [,4] ## [1,] 25 0 0 0 ## [2,] 0 25 0 0 ## [3,] 2 8 9 0 ## [4,] -2 4 0 9 proc.time() - ptm ## user system elapsed ## 0.003 0.000 0.003 ptm <- proc.time() P%*%(D%*%D)%*%solve(P) ## [,1] [,2] [,3] [,4] ## [1,] 25 0 0 0 ## [2,] 0 25 0 0 ## [3,] 2 8 9 0 ## [4,] -2 4 0 9 proc.time() - ptm ## user system elapsed ## 0.003 0.001 0.007
Final comment
For a small matrix this can be tedious but for a large matrix consider that in some cases you can diagonalize and save a lot of memory if you are using, for example, R o Matlab and you need a really large correlation matrix.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.