Now I see it! K-means cluster analysis in R
[This article was first published on mages' blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Of course, a picture on a computer monitor is a coloured plot of x and y coordinates or pixels. Still, I was smitten by David Sparks' posts on is.r(), where he shows how easy it is to read images into R to analyse them. In two posts [1], [2] he replicates functionality of image manipulation programmes like GIMP.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I can't resist to write about this here as well. David's first post is about k-means cluster analysis. One of the popular algorithms for k-means is Lloyd's algorithm. So, on that note I will use a picture of the Lloyd's of London building to play around with David's code, despite the fact that the two Lloyds have nothing to do with each other. Lloyd's provides pictures of its building copyright free on its web site. However, I will use a reduced file size version hosted on wikimedia.
The ReadImages package by Markus Löcher [3] allows me to load a jpeg-file into R. The R object of the images is an array, which has the structure of three layered matrices, representing the value of the colours red, green and blue for each x and y coordinate. I convert the array into a data frame, as this is an accepted structure by k-means and plot the data.
library("ReadImages") url <- "http://upload.wikimedia.org/wikipedia/commons/6/6a/6414A_1_copy.jpg" fn <- tempfile() download.file(url, destfile=fn) readImage <- read.jpeg(fn) dm <- dim(readImage) rgbImage <- data.frame( x=rep(1:dm[2], each=dm[1]), y=rep(dm[1]:1, dm[2]), r.value=as.vector(readImage[,,1]), g.value=as.vector(readImage[,,2]), b.value=as.vector(readImage[,,3])) plot(y ~ x, data=rgbImage, main="Lloyd's building", col = rgb(rgbImage[c("r.value", "g.value", "b.value")]), asp = 1, pch = ".")
Running a k-means analysis on the three colour columns in my data frame allows me to reduce the picture to k colours. The output gives me for each x and y coordinate the colour cluster it belongs to. Thus, I plot my picture again, but replace the original colours with the cluster colours.
To leave a comment for the author, please follow the link and comment on their blog: mages' blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.