[This article was first published on Stats raving mad » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
If you’re (and you should) interested in principal components then take a good look at this. The linked post will take you by hand to do everything from scratch. If you’re not in the mood then the dollowing R functions will help you.
An example.
# Generates sample matrix of five discrete clusters that have
# very different mean and standard deviation values.
z1 <- rnorm(10000, mean=1, sd=1);
z2 <- rnorm(10000, mean=3, sd=3);
z3 <- rnorm(10000, mean=5, sd=5);
z4 <- rnorm(10000, mean=7, sd=7);
z5 <- rnorm(10000, mean=9, sd=9);
mydata <- matrix(c(z1, z2, z3, z4, z5), 2500, 20, byrow=T,
dimnames=list(paste("R", 1:2500, sep=""), paste("C", 1:20, sep="")))
# Performs principal component analysis after scaling the data.
# It returns a list with class "prcomp" that contains five components:
# (1) the standard deviations (sdev) of the principal components,
# (2) the matrix of eigenvectors (rotation),
# (3) the principal component data (x),
# (4) the centering (center) and
# (5) scaling (scale) used.
pca <- prcomp(mydata, scale=T)
# Prints variance summary for all principal components.
summary(pca)
# Set plotting parameters.
x11(height=6, width=12, pointsize=12); par(mfrow=c(1,2))
# Define plotting colors.
mycolors <- c("red", "green", "blue", "magenta", "black")
# Plots scatter plot for the first two principal components
# that are stored in pca$x[,1:2].
plot(pca$x, pch=20, col=mycolors[sort(rep(1:5, 500))])
# Same as above, but prints labels.
plot(pca$x, type="n"); text(pca$x, rownames(pca$x), cex=0.8,
col=mycolors[sort(rep(1:5, 500))])
# Plots scatter plots for all combinations between the first four principal components.
pairs(pca$x[,1:4], pch=20, col=mycolors[sort(rep(1:5, 500))])
# Plots a scatter plot for the first two principal components
# plus the corresponding eigen vectors that are stored in pca$rotation.
biplot(pca)
# Loads library scatterplot3d.
library(scatterplot3d)
# Same as above, but plots the first three principal components in 3D scatter plot
scatterplot3d(pca$x[,1:3], pch=20, color=mycolors[sort(rep(1:5, 500))])
# Importance of components:
# PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
# Standard deviation 2.157 0.9953 0.9831 0.9684 0.9601 0.9465 0.9340 0.9288
# Proportion of Variance 0.233 0.0495 0.0483 0.0469 0.0461 0.0448 0.0436 0.0431
# Cumulative Proportion 0.233 0.2822 0.3305 0.3774 0.4235 0.4683 0.5119 0.5550
# PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16
# Standard deviation 0.9030 0.8989 0.8930 0.8763 0.8703 0.8656 0.8573 0.8458
# Proportion of Variance 0.0408 0.0404 0.0399 0.0384 0.0379 0.0375 0.0367 0.0358
# Cumulative Proportion 0.5958 0.6362 0.6761 0.7145 0.7523 0.7898 0.8265 0.8623
# PC17 PC18 PC19 PC20
# Standard deviation 0.8415 0.8360 0.8302 0.8110
# Proportion of Variance 0.0354 0.0349 0.0345 0.0329
# Cumulative Proportion 0.8977 0.9326 0.9671 1.0000
# KernSmooth 2.23 loaded
# Copyright M. P. Wand 1997-2009
To leave a comment for the author, please follow the link and comment on their blog: Stats raving mad » R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
