Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the context of
Consider the case where we have 2 classes. The means being respectively the 2 black dots. If we partition based on the nearest mean, with the
Points in the red region are closer to the mean in the upper part, while points in the blue region are closer to the mean in the lower part. Here, we will always use the standard
In order to illustrate the
set.seed(1) pts <- cbind(X=rnorm(500,rep(seq(1,9,by=2)/10,100),.022),Y=rnorm(500,.5,.15)) plot(pts)
Here, we have 5 groups. So let us run a 5-means algorithm here.
- we draw randomly 5 points in the space (intial values for the means),
- in the assignment step, we assign each point to the nearest mean
- in the update step, we compute the new centroids of the clusters
To visualize it, see
The code the get the clusters is
kmeans(pts, centers=5, nstart = 1, algorithm = "Lloyd")
Observe that the assignment step is based on computations of Voronoi sets. This can be done in R using
library(tripack) V <- voronoi.mosaic(means[,1],means[,2]) P <- voronoi.polygons(V) points(V,pch=19) plot(V,add=TRUE)
This is what we can visualize below
The code to visualize the
km1 <- kmeans(pts, centers=5, nstart = 1, algorithm = "Lloyd") library(tripack) library(RColorBrewer) CL5 <- brewer.pal(5, "Pastel1") V <- voronoi.mosaic(km1$centers[,1],km1$centers[,2]) P <- voronoi.polygons(V) plot(pts,pch=19,xlim=0:1,ylim=0:1,xlab="",ylab="",col=CL5[km1$cluster]) points(km1$centers[,1],km1$centers[,2],pch=3,cex=1.5,lwd=2) plot(V,add=TRUE)
Here, starting points are draw randomly. If we run it again, we might get
or
On that dataset, it is difficult to get cluster that are the five groups we can actually see. If we use
set.seed(1) A <- c(rep(.2,100),rep(.2,100),rep(.5,100),rep(.8,100),rep(.8,100)) B <- c(rep(.2,100),rep(.8,100),rep(.5,100),rep(.2,100),rep(.8,100)) pts <- cbind(X=rnorm(500,A,.075),Y=rnorm(500,B,.075))
we usually get something better
Colors are obtained from clusters of the
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.