More on Exploring Correlations in R

[This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

About a year ago I wrote a post about producing scatterplot matrices in R. These are handy for quickly getting a sense of the correlations that exist in your data. Recently someone asked me to pull out some relevant statistics (correlation coefficient and p-value) into tabular format to publish beside a scatterplot matrix. The built-in cor() function will produce a correlation matrix, but what if you want p-values for those correlation coefficients? Also, instead of a matrix, how might you get these statistics in tabular format (variable i, variable j, r, and p, for each ij combination)? Here’s the code (you’ll need the PerformanceAnalytics package to produce the plot).

## Correlation matrix with p-values. See http://goo.gl/nahmV for documentation of this function
cor.prob <- function (X, dfr = nrow(X) - 2) {
R <- cor(X, use="pairwise.complete.obs")
above <- row(R) < col(R)
r2 <- R[above]^2
Fstat <- r2 * dfr/(1 - r2)
R[above] <- 1 - pf(Fstat, 1, dfr)
R[row(R) == col(R)] <- NA
R
}
## Use this to dump the cor.prob output to a 4 column matrix
## with row/column indices, correlation, and p-value.
## See StackOverflow question: http://goo.gl/fCUcQ
flattenSquareMatrix <- function(m) {
if( (class(m) != "matrix") | (nrow(m) != ncol(m))) stop("Must be a square matrix.")
if(!identical(rownames(m), colnames(m))) stop("Row and column names must be equal.")
ut <- upper.tri(m)
data.frame(i = rownames(m)[row(m)[ut]],
j = rownames(m)[col(m)[ut]],
cor=t(m)[ut],
p=m[ut])
}
# get some data from the mtcars built-in dataset
mydata <- mtcars[, c(1,3,4,5,6)]
# correlation matrix
cor(mydata)
# correlation matrix with p-values
cor.prob(mydata)
# "flatten" that table
flattenSquareMatrix(cor.prob(mydata))
# plot the data
library(PerformanceAnalytics)
chart.Correlation(mydata)

The cor() function will produce a basic correlation matrix.  12 years ago Bill Venables provided a function on the R help mailing list for replacing the upper triangle of the correlation matrix with the p-values for those correlations (based on the known relationship between t and r). The cor.prob() function will produce this matrix.

Finally, the flattenSquareMatrix() function will “flatten” this matrix to four columns: one column for variable i, one for variable j, one for their correlation, and another for their p-value (thanks to Chris Wallace on StackOverflow for helping out with this one).

> cor(mydata)
mpg disp hp drat wt
mpg 1.0000000 -0.8475514 -0.7761684 0.6811719 -0.8676594
disp -0.8475514 1.0000000 0.7909486 -0.7102139 0.8879799
hp -0.7761684 0.7909486 1.0000000 -0.4487591 0.6587479
drat 0.6811719 -0.7102139 -0.4487591 1.0000000 -0.7124406
wt -0.8676594 0.8879799 0.6587479 -0.7124406 1.0000000
> cor.prob(mydata)
mpg disp hp drat wt
mpg NA 9.380327e-10 1.787835e-07 1.776240e-05 1.293958e-10
disp -0.8475514 NA 7.142679e-08 5.282022e-06 1.222322e-11
hp -0.7761684 7.909486e-01 NA 9.988772e-03 4.145827e-05
drat 0.6811719 -7.102139e-01 -4.487591e-01 NA 4.784260e-06
wt -0.8676594 8.879799e-01 6.587479e-01 -7.124406e-01 NA
> flattenSquareMatrix(cor.prob(mydata))
i j cor p
1 mpg disp -0.8475514 9.380327e-10
2 mpg hp -0.7761684 1.787835e-07
3 disp hp 0.7909486 7.142679e-08
4 mpg drat 0.6811719 1.776240e-05
5 disp drat -0.7102139 5.282022e-06
6 hp drat -0.4487591 9.988772e-03
7 mpg wt -0.8676594 1.293958e-10
8 disp wt 0.8879799 1.222322e-11
9 hp wt 0.6587479 4.145827e-05
10 drat wt -0.7124406 4.784260e-06


Finally, the chart.Correlation() function from the PerformanceAnalytics package produces a very nice scatterplot matrix, with histograms, kernel density overlays, absolute correlations, and significance asterisks (0.05, 0.01, 0.001):

To leave a comment for the author, please follow the link and comment on their blog: Getting Genetics Done.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)