Taylor Swift and Data Analysis
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Who will be the most talked-about celebrity before, during, and after the Super Bowl?
She is an accomplished performer. songwriter, businesswoman, and philanthropist. I think she is very pretty. And those lips!
So what can a data analyst add to everything that has been said about her?
I was curious whether R could identify her lipstick color. I should preface this by saying I have some degree of color-challengedness, although I am not colorblind. I am also aware that you can Google something like “what lipstick shade does taylor swift use” and you will get many replies. But I am more interested in an answer like E41D4F. I do wonder if I could visit a cosmetics store and say, “I’d like to buy a lipstick for my wife. Do you have anything in E41D4F ?”
There are sites that take an image, allow you to hover over a particular point, and the site will attempt to identify the computer color. Here is one: RedKetchup But I want a more R-related approach. A note on computers and colors: A computer represents an image in units called pixels. Each pixel contains a combination of base sixteen numbers for red, green and blue. A base 16 number ranges from 0 through F. Each of red, green and blue is a two-digit base 16 number, so a full number is a six-digit base 16 number. There are 16^ 6 = 16,777,216 possible colors. E41D4F is one of those 16.8 million colors.
There are R packages that will take an image and identify the most frequent colors. I tried this with the image above, and I got unhelpful colors. This is because the image contains the background, her hair, her clothing, and lots of other things unrelated to her lips. If you think about it, the lips are really a small portion of a face anyway. So I need to narrow down the image to her lips.
I plotted the image on a rectangular grid, using the number of columns of the image file as the xlimit width, and the number of rows as the ylimit height. Then by trial and error I manually found the coordinates of a rectangle for the lips. The magick library allows you to crop an image, using this crop format:
The package colouR will then identify the most frequent colors. I found it necessary to save the cropped image to my computer and then read it back in because colouR would not accept it otherwise. The getTopCol command will extract the top colors by frequency. I assume it is counting frequency of hex color codes among the pixel elements. Here is a histogram of the result:
Really? I’m disappointed. Although I am color-challenged, this can’t be right.
I have tried this with other photos of Taylor. I do get that she wears more than one lipstick color. I have also learned that with 16.8 million colors, perhaps the color is not identical on the entire lip – maybe some of you lipstick aficionados have always known this. All of these attempts have been somewhat unsatisfactory. There are too many colors on the graph that seem absolutely wrong, and no one color seems to really capture her shade, at least as I perceive it. Any suggestions from the R community?
No matter who you root for in the Super Bowl – go Taylor.
Here is my R code:
library(png)
library(ggplot2)
library(grid)
library(colouR)
library(magick)
xpos <- c(0,0,0)
ypos <- c(0,0,0)
df <- data.frame(xpos = xpos, ypos = ypos)
# downloaded from
# https://img.etimg.com/thumb/msid-100921419,width-300,height-225,imgsize-50890,resizemode-75/taylor-swift-mitchell-taebel-from-indiana-arrested-for-stalking-threatening-singer.jpg
img <- "C:/Users/Jerry/Desktop/R_files/taylor/taylor_swift.png"
img <- readPNG(img, native=TRUE)
height <- nrow(img) # 457
width <- ncol(img) # 584
img <- rasterGrob (img, interpolate = TRUE)
# print onto grid
ggplot(data = df,
aes(xpos, ypos)) +
xlim(0, width) + ylim(0, height) +
geom_blank() +
annotation_custom(img, xmin=0, xmax=width, ymin=0, ymax=height)
#############################################
# choose dimensions of subset rectangle
width <- 105
height <- 47
x1 <- 215 # from left
y1 <- 300 # from top
library(magick)
# must read in as magick object
img <- image_read('C:/Users/Jerry/Desktop/R_files/taylor/taylor_swift.png')
# print(img)
# crop format:
##############################################
# extract top colors of lips image
top10 <- colouR::getTopCol(path = "C:/Users/Jerry/Desktop/R_files/taylor/lips1.png",
# plot
# End
cropped_img <- image_crop(img, "105x47+215+300")
print(cropped_img) # lips only
image_write(cropped_img, path = “C:/Users/Jerry/Desktop/R_files/taylor/lips1.png”, format = “png”)
n = 10, avgCols = FALSE, exclude = FALSE)
top10
ggplot(top10, aes(x = hex, y = freq, , fill = hex)) +
geom_bar(stat = ‘identity’) +
labs(title=”Top 10 colors by frequency”) +
xlab(“HEX colour code”) + ylab(“Frequency”) +
theme(
legend.position=”NULL”,
plot.title = element_text(size=15, face=”bold”),
axis.title = element_text(size=15, face=”bold”),
axis.text.x = element_text(angle = 45, hjust = 1, size=12, face=”bold”))
##################################################################################
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.