A trip from variance-covariance to correlation and back

[This article was first published on R on Fixing the bridge between biologists and statisticians, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The variance-covariance and the correlation matrices are two entities that describe the association between the columns of a two-way data matrix. They are very much used, e.g., in agriculture, biology and ecology and they can be easily calculated with base R, as shown in the box below.

data(mtcars)
matr <- mtcars[,1:4]

# Covariances
Sigma <- cov(matr)

# Correlations
R <- cor(matr)

Sigma
##              mpg        cyl       disp        hp
## mpg    36.324103  -9.172379  -633.0972 -320.7321
## cyl    -9.172379   3.189516   199.6603  101.9315
## disp -633.097208 199.660282 15360.7998 6721.1587
## hp   -320.732056 101.931452  6721.1587 4700.8669
R
##             mpg        cyl       disp         hp
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475
## disp -0.8475514  0.9020329  1.0000000  0.7909486
## hp   -0.7761684  0.8324475  0.7909486  1.0000000

It is useful to be able to go back and forth from variance-covariance to correlation, without going back to the original data matrix. Let’s consider that the variance-covariance of the two variables X and Y is:

\[\textrm{cov}(X, Y) = \sum\limits_{i=1}^{n} {(X_i - \hat{X})(Y_i - \hat{Y})}\]

where \(\hat{Y}\) and \(\hat{X}\) are the means for each variable. The correlation is:

\[\textrm{cor}(X, Y) = \frac{\textrm{cov}(X, Y)}{\sigma_x \sigma_y} \]

where \(\sigma_x\) and \(\sigma_y\) are the standard deviations for X and Y.

The opposite relationship is clear:

\[ \textrm{cov}(X, Y) = \textrm{cor}(X, Y) \sigma_x \sigma_y\]

Therefore, converting from covariance to correlation is pretty easy. For example, take the covariance between ‘cyl’ and ‘mpg’ above (-9.172379), the correlation is:

-633.097208 / (sqrt(36.324103) * sqrt(15360.7998))
## [1] -0.8475514

On the reverse, if we have the correlation (-0.8521620), the covariance is

-0.8475514 * sqrt(36.324103) * sqrt(15360.7998)
## [1] -633.0972

If we consider the whole covariance matrix, we have to take each element in this matrix and divide it by the square roots of the diagonal elements in the same column and in the same row (see figure below).

The question is: how can we do all these calculations in one single step, for all elements in the covariance matrix, to calculate the corresponding correlation matrix?

If we have some memories of matrix algebra, we might remember that if we take a diagonal matrix of order \(n \times n\) and multiply it by a square matrix with the same order, all elements in each column are multiplied by the diagonal element in the corresponding column:

\[\begin{pmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{pmatrix} \times \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 4 \end{pmatrix} = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 1 & 2 & 3 & 4 \\ 1 & 2 & 3 & 4 \\ 1 & 2 & 3 & 4 \end{pmatrix}\]

If we reverse the order of factors, all elements in each row are multiplied by the diagonal element in the corresponding row:

\[ \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 4 \end{pmatrix} \times \begin{pmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{pmatrix} = \begin{pmatrix} 1 & 1 & 1 & 1 \\ 2 & 2 & 2 & 2 \\ 3 & 3 & 3 & 3 \\ 4 & 4 & 4 & 4 \end{pmatrix} \]

Therefore, if we take a covariance matrix \(\Sigma\) of order \(n \times n\) and pre-multiply and post-multiply it for the same diagonal matrix of order \(n \times n\), each element in \(\Sigma\) is multiplied by both the diagonal elements in the same row and same column, which is exactly what we are looking for.

In the code below, we:

  1. Create a covariance matrix
  2. Take the square roots of the diagonal element (standard deviations) and load them in a diagonal matrix
  3. Invert this diagonal matrix
  4. Pre-multiply and post-multiply the covariance matrix for this diagonal matrix of inverse standard deviations
StDev <- sqrt(diag(Sigma))
StDevMat <- diag(StDev)
InvStDev <- solve(StDevMat)
InvStDev %*% Sigma %*% InvStDev
##            [,1]       [,2]       [,3]       [,4]
## [1,]  1.0000000 -0.8521620 -0.8475514 -0.7761684
## [2,] -0.8521620  1.0000000  0.9020329  0.8324475
## [3,] -0.8475514  0.9020329  1.0000000  0.7909486
## [4,] -0.7761684  0.8324475  0.7909486  1.0000000

Going from correlation to covariance can be done similarly, although, in this case, together with the correlation matrix we also need to have the standard deviations of the original variables, because they are not included in the matrix under transformation:

StDevMat %*% R %*% StDevMat
##             [,1]       [,2]       [,3]      [,4]
## [1,]   36.324103  -9.172379  -633.0972 -320.7321
## [2,]   -9.172379   3.189516   199.6603  101.9315
## [3,] -633.097208 199.660282 15360.7998 6721.1587
## [4,] -320.732056 101.931452  6721.1587 4700.8669

Solutions with R

Is there any other solutions for those who are not accustomed to matrix algebra? The easiest way to go from covariance to correlation is to use the cov2cor() function in the ‘nlme’ package.

# From covariance to correlation
library(nlme)
cov2cor(Sigma)
##             mpg        cyl       disp         hp
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475
## disp -0.8475514  0.9020329  1.0000000  0.7909486
## hp   -0.7761684  0.8324475  0.7909486  1.0000000

With base R, we can sweep() twice:

# From covariance to correlation
sweep(sweep(Sigma, 1, StDev, FUN = "/"), 2, StDev, FUN = "/")
##             mpg        cyl       disp         hp
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475
## disp -0.8475514  0.9020329  1.0000000  0.7909486
## hp   -0.7761684  0.8324475  0.7909486  1.0000000
# From correlation to covariance
sweep(sweep(R, 1, StDev, FUN = "*"), 2, StDev, FUN = "*")
##              mpg        cyl       disp        hp
## mpg    36.324103  -9.172379  -633.0972 -320.7321
## cyl    -9.172379   3.189516   199.6603  101.9315
## disp -633.097208 199.660282 15360.7998 6721.1587
## hp   -320.732056 101.931452  6721.1587 4700.8669

We can also scale() and t() twice, but it looks far less neat:

# From covariance to correlation
scale(t(scale(t(Sigma), center = F, scale = StDev)), 
      center = F, scale = StDev)
##             mpg        cyl       disp         hp
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475
## disp -0.8475514  0.9020329  1.0000000  0.7909486
## hp   -0.7761684  0.8324475  0.7909486  1.0000000
## attr(,"scaled:scale")
##        mpg        cyl       disp         hp 
##   6.026948   1.785922 123.938694  68.562868
# From correlation to covariance
scale(t(scale(t(R), center = F, scale = 1/StDev)), 
      center = F, scale = 1/StDev)
##              mpg        cyl       disp        hp
## mpg    36.324103  -9.172379  -633.0972 -320.7321
## cyl    -9.172379   3.189516   199.6603  101.9315
## disp -633.097208 199.660282 15360.7998 6721.1587
## hp   -320.732056 101.931452  6721.1587 4700.8669
## attr(,"scaled:scale")
##         mpg         cyl        disp          hp 
## 0.165921457 0.559934979 0.008068505 0.014585154

Just curious whether you young students have some better solution; I am sure you have one! Please, drop me a line!

Happy coding!


Prof. Andrea Onofri
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)
Send comments to: [email protected]

To leave a comment for the author, please follow the link and comment on their blog: R on Fixing the bridge between biologists and statisticians.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)