Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One of the most common and basic techniques for analyzing the relationships between variables is zero-order correlation. This tutorial will explore the ways in which R can be used to employ this method.
Tutorial Files
Before we start, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains pre and post test scores for 66 subjects on a series of reading comprehension tests (Moore & McCabe, 1989). Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached.
Correlation Between Two Variables
The most fundamental way to calculate correlations is to directly operate on two variables. In R, this can be done using the cor() function. The cor() function accepts the following arguments (“Correlation, Variance…”, n.d.).
- x: the first variable to correlate
- y: the second variable to correlate
- use (optional): determines how missing values are handled; accepts “all.obs”, “complete.obs”, or “pairwise.complete.obs”
- method (optional): determines the statistical method used; accepts c(“pearson”), c(“kendall”), or c(“spearman”)
In most cases, x and y are the only arguments that you will use when running the cor() function. The basic format for calculating a correlation is cor(VAR1, VAR2), where VAR1 and VAR2 are the variables that you would like to correlate.
cor(VAR1, VAR2) Example
Suppose that our research question is: “How does a subject’s pretest 1 score relate to his or her posttest 1 score?” The following example demonstrates how to use the cor() function to calculate the correlation between pretest 1 (PRE1) and posttest 1 (POST1).
- >#use cor(VAR1, VAR2) to calculate the correlation between variable 1 and variable 2
- > cor(PRE1, POST1)
- [1] 0.5659026
Correlations Between Multiple Variables
When beginning to analyze a dataset, researchers often want to get a complete picture of all correlations, rather than just a single one. Conveniently, the cor() function can also be run on an entire set of data. The format for this operation is cor(DATAVAR), where DATAVAR is the name of the R variable containing the data.
cor(DATAVAR) Example
Suppose now that our research question is: “How do all of the test scores in the dataset relate to each other?” The following example demonstrates how to use the cor() function to calculate all of the correlations in a dataset.
- >#use cor(DATAVAR) to get the correlations between all variables
- > cor(datavar)
The output of the preceding function is pictured below.
Complete Correlational Analysis
To see a complete example of how correlational analysis can be conducted in R, please download the correlational analysis example (.txt) file.
References
Correlation, Variance and Covariance (Matrices). (n.d.). Retrieved October, 27, 2009 from http://sekhon.berkeley.edu/stats/html/cor.html
Moore, D., and McCabe, G. (1989). Introduction to the practice of statistics [Data File]. Retrieved October, 27, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/ReadingTestScores.html
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.