Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
NOTE: This write-up picks up where the previous one left off. All of the session data is carried over.
Color Similarity
Now, I’d like to evaluate color similarity more closely. To help verify any quantitative deductions with some intuition, I’ll consider only a single league for this–the NBA, the league that I know the best.
Because I’ll end up plotting team names at some point and some of the
full names are relatively lengthy, I want to get the official
abbreviations for each team. Unfortunately, these don’t come with the
teamcolor
package, but I
can use Alex Bresler’s nbastatR
package to get them.
# Assign `df_dict_nba_teams` to Global environment. nbastatR::assign_nba_teams() nms_nba <- teamcolors::teamcolors %>% filter(league == "nba") %>% inner_join( df_dict_nba_teams %>% setNames(snakecase::to_snake_case(names(.))) %>% filter(!is_non_nba_team) %>% select(name = name_team, slug = slug_team), by = c("name") ) colors_tidy_ord2_nba <- nms_nba %>% select(name, league, slug) %>% inner_join(colors_tidy_ord2, by = c("name", "league"))
To give the unfamiliar reader a better understanding of what exactly
this subset of the teamcolors
data incorporate, here’s a visualization
of the primary and secondary colors of all NBA teams
After grabbing the abbreviations (or slug
s), I can move on to breaking
up the hex values into their RGB components. 1
I’ll be looking at only the primary and secondary colors
again.
colors_ord2_nba_rgb_tidy <- colors_tidy_ord2_nba %>% add_rgb_cols() %>% select(-hex) %>% tidyr::gather(rgb, value, red, green, blue) colors_ord2_nba_rgb_tidy %>% create_kable()
name | league | slug | ord | rgb | value |
---|---|---|---|---|---|
Atlanta Hawks | nba | ATL | primary | red | 225 |
Atlanta Hawks | nba | ATL | secondary | red | 196 |
Boston Celtics | nba | BOS | primary | red | 0 |
Boston Celtics | nba | BOS | secondary | red | 187 |
Brooklyn Nets | nba | BKN | primary | red | 6 |
Brooklyn Nets | nba | BKN | secondary | red | 6 |
Charlotte Hornets | nba | CHA | primary | red | 29 |
Charlotte Hornets | nba | CHA | secondary | red | 0 |
Chicago Bulls | nba | CHI | primary | red | 206 |
Chicago Bulls | nba | CHI | secondary | red | 6 |
1 # of total rows: 180 |
With the RGB values extracted, I can use the widyr::pairwise_dist()
function to compute the relative distance among teams in terms of RGB
values for each color ordinality.I think the default method–“Euclidean”
distance–is
reasonable.
do_pairwise_dist <- function(data, method) { data %>% group_by(ord) %>% widyr::pairwise_dist(name, rgb, value, upper = TRUE, method = method) %>% rename(name1 = item1, name2 = item2) %>% select(everything(), value = ncol(.)) %>% arrange(value, .by_group = TRUE) %>% ungroup() }
As one might expect, there’s not much difference between these two distance methods (if correlation is deemed a valid metric for quantifying similarity).
How exactly do all of the individual distances compare?
I think that the above plot does a good job of highlighting the average distance values (in terms of RGB) of each team. Additionally, by sorting the teams by value, it illustrates exactly which teams are the most “generic” (i.e. most similar to all other teams) and the most “unique” (i.e. least similar to all other teams.)
I can also use a heat map to visualize the same data (Who doesn’t like a good heat map?)
Like with the previous plot, I order the teams on each axis by total distance from all other teams–teams with the highest cumulative similarity to all other teams appear towards the bottom and left, while teams that contrast most with all others appear towards the top and right. And, to add some nuance, I emphasize the individual pairs that have the highest and lowest similarity with different colors.
Exactly which teams match most and least closely with one another (in terms of color similarity)? Here’s a list of the top and bottom matches for each team.
rank_overall | name1 | name2 | dist |
---|---|---|---|
1 | Sacramento Kings | Memphis Grizzlies | 173 |
1 | Sacramento Kings | Indiana Pacers | 399 |
2 | Memphis Grizzlies | Sacramento Kings | 173 |
2 | Memphis Grizzlies | Indiana Pacers | 463 |
3 | Boston Celtics | Utah Jazz | 174 |
3 | Boston Celtics | Indiana Pacers | 483 |
4 | Portland Trail Blazers | Houston Rockets | 63 |
4 | Portland Trail Blazers | Brooklyn Nets | 521 |
5 | Charlotte Hornets | Minnesota Timberwolves | 171 |
5 | Charlotte Hornets | Atlanta Hawks | 472 |
6 | Cleveland Cavaliers | Miami Heat | 55 |
6 | Cleveland Cavaliers | San Antonio Spurs | 544 |
7 | Houston Rockets | Portland Trail Blazers | 63 |
7 | Houston Rockets | San Antonio Spurs | 541 |
8 | Miami Heat | Cleveland Cavaliers | 55 |
8 | Miami Heat | San Antonio Spurs | 529 |
9 | Detroit Pistons | Los Angeles Clippers | 0 |
9 | Detroit Pistons | Oklahoma City Thunder | 559 |
10 | Los Angeles Clippers | Detroit Pistons | 0 |
10 | Los Angeles Clippers | Oklahoma City Thunder | 559 |
11 | Philadelphia 76ers | Detroit Pistons | 0 |
11 | Philadelphia 76ers | Oklahoma City Thunder | 559 |
12 | Utah Jazz | New Orleans Pelicans | 141 |
12 | Utah Jazz | Golden State Warriors | 593 |
13 | Minnesota Timberwolves | Charlotte Hornets | 171 |
13 | Minnesota Timberwolves | Houston Rockets | 464 |
14 | Chicago Bulls | Toronto Raptors | 0 |
14 | Chicago Bulls | Orlando Magic | 584 |
15 | Toronto Raptors | Chicago Bulls | 0 |
15 | Toronto Raptors | Orlando Magic | 584 |
16 | Phoenix Suns | Indiana Pacers | 143 |
16 | Phoenix Suns | Orlando Magic | 561 |
17 | New Orleans Pelicans | Washington Wizards | 0 |
17 | New Orleans Pelicans | Golden State Warriors | 568 |
18 | Washington Wizards | New Orleans Pelicans | 0 |
18 | Washington Wizards | Golden State Warriors | 568 |
19 | Atlanta Hawks | Miami Heat | 175 |
19 | Atlanta Hawks | Brooklyn Nets | 493 |
20 | New York Knicks | Oklahoma City Thunder | 75 |
20 | New York Knicks | Golden State Warriors | 586 |
21 | Oklahoma City Thunder | New York Knicks | 75 |
21 | Oklahoma City Thunder | Golden State Warriors | 578 |
22 | Denver Nuggets | New York Knicks | 142 |
22 | Denver Nuggets | Golden State Warriors | 546 |
23 | Brooklyn Nets | Chicago Bulls | 203 |
23 | Brooklyn Nets | Portland Trail Blazers | 521 |
24 | Los Angeles Lakers | Indiana Pacers | 111 |
24 | Los Angeles Lakers | Milwaukee Bucks | 542 |
25 | Dallas Mavericks | Orlando Magic | 0 |
25 | Dallas Mavericks | Indiana Pacers | 586 |
26 | Orlando Magic | Dallas Mavericks | 0 |
26 | Orlando Magic | Indiana Pacers | 586 |
27 | Golden State Warriors | Los Angeles Lakers | 122 |
27 | Golden State Warriors | Utah Jazz | 593 |
28 | Milwaukee Bucks | Dallas Mavericks | 231 |
28 | Milwaukee Bucks | San Antonio Spurs | 644 |
29 | Indiana Pacers | Los Angeles Lakers | 111 |
29 | Indiana Pacers | Milwaukee Bucks | 617 |
30 | San Antonio Spurs | Chicago Bulls | 225 |
30 | San Antonio Spurs | Milwaukee Bucks | 644 |
These results don’t really agree with what I–and maybe other NBA
fans–would have guessed. The Sacramento Kings (SAC
) have purple as
their primary color, which is relatively unusual. I would think that
they would be in the lower half of these rankings. Whats going on? …
Color Theory
When doing this color-based analysis, several questions came to mind:
Is the RGB model really the best framework to use for comparing colors? What about the HSL (Hue, Saturation, Lightness) model? Additionally, a quick Google search for “What is the best method for identifying similarity between colors?” indicates the YUV representation–a model I hadn’t heard of before–is best, (if human perception is the main concern).
Is Euclidean distance the best “distance” method to use? But, because I’m curious, I’ll look at how different the results would be if the “Manhattan” distance is used instead.
Is “distance” even the best method for determining color similarity. Why not a “similarity” metric (such as cosine similarity)?
Since I’m not expert in color models, and because I there is no definitive/conclusive research with which I can cross-check my findings for color similarity among NBA teams, I think its worthwhile to explore these questions in more detail. First, I’ll need to create HSL and YUV variations of the color data that I can compare to the RGB version that I’ve used up to this point. (This will help me answer the first question.) 2 Then, with each of these data sets in hand, I’ll tackle the latter two questions directly. In the end, by comparing the different models with different methods, I hope to come to some stronger conclusions and/or justifications of my findings about NBA team colors.
class=“section level3”>
Euclidean Distance vs. Manhattan Distance
I’ll look at two distance methods–Euclidean and Manhattan–to justify my choice of Euclidean distance before. To do this, I want to verify that the similarity determined by the two methods is nearly identical. (I would be surprised if they aren’t.)
rgb_euclidean | rgb_manhattan |
---|---|
NA | 97.16 |
97.16 | NA |
hsl_euclidean | hsl_manhattan |
---|---|
NA | 97.62 |
97.62 | NA |
yuv_euclidean | yuv_manhattan |
---|---|
NA | 96.26 |
96.26 | NA |
Indeed, it looks like there is high correlation found between the Euclidean and Manhattan distances calculated when the hex color values are broken down into color components, regardless of whether the RGB, HSL, or YUV representation is used.
Now, when keeping the distance method constant (Euclidean), how do the color models compare?
rowname
rgb_dist
hsl_dist
yuv_dist
The numbers indicate that there is some strong positive correlation, especially between the RGB and YUV color schemas. This indicates that the conclusions that I came to regarding most similar and dissimilar NBA team colors would not be much different if using the HSL or YUV models instead of the RGB model.
Distance vs. Similarity
To compare distance (Euclidean) with cosine similarity, I can create and
use a similar set of functions to those used for comparing distance
methods. To visualize the results in an interpretable manner, I can use
the network_plot()
function from
Dr. Simon’s corrr
package`. This function is cool for
visualizing correlation data in a way other than with a traditional
correlation matrix. 3
It’s clear that the RGB and YUV schemas are fairly similar “within” both metrics–Euclidean distance and cosine similarity–and both are relatively dissimilar to HSL. However, all three color models show negative correlations “within” themselves when comparing the two metrics against one another. (i.e. The RGB schema has a negative correlation when comparing its distance values to its similarity values, and likewise for the HSL and YUV models.)
So, which color model and which metric should be used? In my opinion, the RGB model seems like a good choice, both because it is relatively similar to at least one other method (YUV) and because it is (probably) the most relatable scheme to people who don’t know much about color theory. For metric, I think that the choice of Euclidean distance is valid. My Google search (which makes the case for YUV) makes the assumption that Euclidean distance is being used. Additionally, a separate Google search for “euclidean distance vs. cosine similarity” turns up an easy-to-follow technical write-up that implies that cosine similarity is probably not really appropriate for this kind of color analysis.
Conclusion
That’s all I got for this topic. I hope that the techniques shown here are general enough that they can be applied to any set of color to extract some fun (and meaningful) insight.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.