[This article was first published on Darren Wilkinson » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The CoCo Matrix (correlation coefficient matrix) is a script for R that takes a table headed with multiple variables and calculates the correlation coefficients between each of the variables, determines which are statistically significant, and represents them visually in a grid-plot. I created the CoCo Matrix to cross correlate a table with a large number of variables to quickly assess where important correlations could be found.
Using the CoCo Matrix
The R file can be downloaded here or copied from the textbox at the end of this post.
- If you know the number of samples in your dataset (n) then degrees of freedom (df) = n-2. Use this table to find the R value above which significant values lie. In the code, at the top you should change the value of “p” as per the value you just looked up. If you don’t know the value for n then run the code once and type “n” into the console.
- If you want, customise the colours in the customisation area of the code
- Run the code. A dialogue box will request a file. Alternatively replace the code to direct to the file you want to use.
- Voila!
This is a very rough script I wrote, and I intend to make it a lot better at some point when I have the time. If you have any suggestions for improvements then please comment below or get in touch with me.
# CoCo Matrix version 1.0 # Written by Darren J. Wilkinson # wilkinsondarren.wordpress.com # d.j.wilkinson@ed.ac.uk # # The "CoCo Matrix" visualises the correlation coefficients for a given set of data. # Like-Like correlations are given NA values (e.g. Height vs Height = NA). For the moment # duplicates such as Height vs. Weight and Weight vs. Height remain. At some point I'll # provide an update that removes duplicates like that. # # Please feel free to edit the code, and if you make any improvements please let me know # either on wilkinsondarren.wordpress.com or send me an email at d.j.wilkinson@ed.ac.uk # Packages ------- library (cwhmisc) library (ggplot2) library (grid) library (scales) # ---------------- # Plot Customisation ---------------------------------------------------------- # (for good colour suggestions visit colourlovers.com) col.significant = "#556270" # Colour used for significant correlations col.notsignificant = "lightgrey" # Colour used for non-significant correlations col.na = "white" # Colour used for NA values e1 = c("nb", "ta", "ba", "rb", "hf", "zr", "yb", "y", "th", "u") # p) {s = "Significant"} if (temp < p) {s = "Not Significant"} if (temp == 1) {s = NA} if (temp == 1) {temp = NA} results[h,i] = temp plot.data[r,4] = s plot.data[r,3] = temp plot.data[r,2] = h plot.data[r,1] = i } } # Open new quartz window dev.new ( width = 12, height = 9 ) # Plot the matrix ggplot (data = plot.data, aes (x = x, y = y)) + geom_point (aes (colour = sig), size = 20) + scale_x_continuous (labels = e1, name = "", breaks = c(1:n.e1)) + scale_y_continuous (labels = e1, name = "", breaks = c(1:n.e1)) + scale_colour_manual (values = c(col.notsignificant, col.significant, col.na)) + labs (title = "CoCo Matrix v1.0")+ theme ( plot.title = element_text (vjust = 3, size = 20, colour = "black"), #plot title plot.margin = unit (c(3, 3, 3, 3), "lines"), #adjust the margins of the entire plot plot.background = element_rect (fill = "white", colour = "black"), panel.border = element_rect (colour = "black", fill = F, size = 1), #change the colour of the axes to black panel.grid.major = element_blank (), # remove major grid panel.grid.minor = element_blank (), # remove minor grid panel.background = element_rect (fill = "white"), #makes the background transparent (white) NEEDED FOR INSIDE TICKS legend.background = element_rect (colour = "black", size = 0.5, fill = "white"), legend.justification = c(0, 0), #legend.position = c(0, 0), # put the legend INSIDE the plot area legend.key = element_blank (), # switch off the rectangle around symbols in the legend legend.box.just = "bottom", legend.box = "horizontal", legend.title = element_blank (), # switch off the legend title legend.text = element_text (size = 15, colour = "black"), #sets the attributes of the legend text# axis.title.x = element_text (vjust = -2, size = 20, colour = "black"), #change the axis title axis.title.y = element_text (vjust = -0.1, angle = 90, size = 20, colour = "black"), #change the axis title axis.text.x = element_text (size = 17, vjust = -0.25, colour = "black"), #change the axis label attributes axis.text.y = element_text (size = 17, hjust = 1, colour = "black"), #change the axis label attributes# axis.ticks = element_line (colour = "black", size = 0.5), #sets the thickness and colour of axis ticks axis.ticks.length = unit(-0.25 , "cm"), #setting a negative length plots inside, but background must be FALSE colour axis.ticks.margin = unit(0.5, "cm") # the margin between the ticks and the text ) # Print data tables in the console results plot.data
To leave a comment for the author, please follow the link and comment on their blog: Darren Wilkinson » R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.