[This article was first published on R – BioStatMatt, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The bit of R code below illustrates the principal curves methods as described in The Elements of Statistical Learning, by Hastie, Tibshirani, and Friedman (Ch. 14; the book is freely available from the authors’ website). Specifically, the code generates some bivariate data that have a nonlinear association, initializes the principal curve using the first (linear) principal component, and then computes three iterations of the algorithm described in section 14.5.2. I used the ‘animation’ package to generate the following animated GIF, which illustrates these steps.
## generate some bivariate data set.seed(42) x1 <- seq(1,10,0.3) w = .6067; a0 = 1.6345; a1 = -.6235; b1 = -1.3501; a2 = -1.1622; b2 = -.9443; x2 = a0 + a1*cos(x1*w) + b1*sin(x1*w) + a2*cos(2*x1*w) + b2*sin(2*x1*w) + rnorm(length(x1),0,3/4) x <- scale(cbind(x1,x2)) alim <- extendrange(x, f=0.1) alim_ <- range(x) ## plot centered data plot(x[,1], x[,2], bty='n', xlab=expression(x[1]), ylab=expression(x[2]), xlim=alim, ylim=alim) legend("topleft", legend=c("Initialize"), bty="n") ## plot first principal component line svdx <- svd(x) clip(alim_[1],alim_[2],alim_[1],alim_[2]) with(svdx, abline(a=0, b=v[2,1]/v[1,1])) ## plot projections of each point onto line z1 <- with(svdx, x%*%v[,1]%*%t(v[,1])) segments(x0=x[,1],y0=x[,2], x1=z1[,1],y1=z1[,2]) ## compute initial lambda (arc-lengths associated with ## orthogonal projections of data onto curve) lam <- with(svdx, as.numeric(u[,1]*d[1])) for(itr in 1:3) { #### step (a) of iterative algorithm #### ## compute scatterplot smoother in either dimension ## increase 'df' to make the curve more flexible fit1 <- smooth.spline(x=lam, y=x[,1], df=4) fit2 <- smooth.spline(x=lam, y=x[,2], df=4) ## plot data and the principal curve for a sequence of lambdas plot(x[,1], x[,2], bty='n', xlab=expression(x[1]), ylab=expression(x[2]), xlim=alim, ylim=alim) legend("topleft", legend=c("Step (a)"), bty="n") seq_lam <- seq(min(lam),max(lam),length.out=100) lines(predict(fit1, seq_lam)$y, predict(fit2, seq_lam)$y) ## show points along curve corresponding ## to original lambdas z1 <- cbind(predict(fit1, lam)$y, predict(fit2, lam)$y) segments(x0=x[,1],y0=x[,2], x1=z1[,1],y1=z1[,2]) #### step (b) of iterative algorithm #### ## recompute lambdas euc_dist <- function(l, x, f1, f2) sum((c(predict(f1, l)$y, predict(f2, l)$y) - x)^2) lam <- apply(x,1,function(x0) optimize(euc_dist, interval=extendrange(lam, f=0.50), x=x0, f1=fit1, f2=fit2)$minimum) ## show projections associated with recomputed lambdas plot(x[,1], x[,2], bty='n', xlab=expression(x[1]), ylab=expression(x[2]), xlim=alim, ylim=alim) legend("topleft", legend=c("Step (b)"), bty="n") seq_lam <- seq(min(lam),max(lam),length.out=100) lines(predict(fit1, seq_lam)$y, predict(fit2, seq_lam)$y) z1 <- cbind(predict(fit1, lam)$y, predict(fit2, lam)$y) segments(x0=x[,1],y0=x[,2], x1=z1[,1],y1=z1[,2]) }
To leave a comment for the author, please follow the link and comment on their blog: R – BioStatMatt.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.