R: k-Means Clustering on an Image
[This article was first published on Analysis with Programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Enough with the theory we recently published, let’s take a break and have fun on the application of Statistics used in Data Mining and Machine Learning, the k-Means Clustering.
We will utilize the following packages for input and output:
The image is represented by large array of pixels with dimension rows by columns by channels — red, green, and blue or RGB.

Plot the clustered colours:
Possible clusters of pixels on different k-Means:
I suggest you try it!
k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. (Wikipedia, Ref 1.)We will apply this method to an image, wherein we group the pixels into k different clusters. Below is the image that we are going to use,
![]() |
Colorful Bird From Wall321 |
Download and Read the Image
Let’s get started by downloading the image to our workspace, and tell R that our data is a JPEG file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Load the package | |
library(jpeg) | |
url <- "http://www.wall321.com/thumbnails/detail/20120304/colorful%20birds%20tropical%20head%203888x2558%20wallpaper_www.wall321.com_40.jpg" | |
# Download the file and save it as "Image.jpg" in the directory | |
dFile <- download.file(url, "Image.jpg") | |
img <- readJPEG("Image.jpg") # Read the image |
Cleaning the Data
Extract the necessary information from the image and organize this for our computation:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Obtain the dimension | |
imgDm <- dim(img) | |
# Assign RGB channels to data frame | |
imgRGB <- data.frame( | |
x = rep(1:imgDm[2], each = imgDm[1]), | |
y = rep(imgDm[1]:1, imgDm[2]), | |
R = as.vector(img[,,1]), | |
G = as.vector(img[,,2]), | |
B = as.vector(img[,,3]) | |
) |
Plotting
Plot the original image using the following codes:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(ggplot2) | |
# ggplot theme to be used | |
plotTheme <- function() { | |
theme( | |
panel.background = element_rect( | |
size = 3, | |
colour = "black", | |
fill = "white"), | |
axis.ticks = element_line( | |
size = 2), | |
panel.grid.major = element_line( | |
colour = "gray80", | |
linetype = "dotted"), | |
panel.grid.minor = element_line( | |
colour = "gray90", | |
linetype = "dashed"), | |
axis.title.x = element_text( | |
size = rel(1.2), | |
face = "bold"), | |
axis.title.y = element_text( | |
size = rel(1.2), | |
face = "bold"), | |
plot.title = element_text( | |
size = 20, | |
face = "bold", | |
vjust = 1.5) | |
) | |
} | |
# Plot the image | |
ggplot(data = imgRGB, aes(x = x, y = y)) + | |
geom_point(colour = rgb(imgRGB[c("R", "G", "B")])) + | |
labs(title = "Original Image: Colorful Bird") + | |
xlab("x") + | |
ylab("y") + | |
plotTheme() |

Clustering
Apply k-Means clustering on the image:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kClusters <- 3 | |
kMeans <- kmeans(imgRGB[, c("R", "G", "B")], centers = kClusters) | |
kColours <- rgb(kMeans$centers[kMeans$cluster,]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ggplot(data = imgRGB, aes(x = x, y = y)) + | |
geom_point(colour = kColours) + | |
labs(title = paste("k-Means Clustering of", kClusters, "Colours")) + | |
xlab("x") + | |
ylab("y") + | |
plotTheme() |

I suggest you try it!
Reference
- K-means clustering. Wikipedia. Retrieved September 11, 2014.
To leave a comment for the author, please follow the link and comment on their blog: Analysis with Programming.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.