Visualizing Brooklyn Nine-Nine with R!
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
“Hey there criminal. It’s me, Johnny Law!” – Jake Peralta, NYPD.
Brooklyn Nine-Nine has become one of my favorite sitcoms in recent
years, probably taking over from Parks & Recreation and Community. So in
this blog post, I’m going to web scrape some very simple TV statistics,
clean it up with the tidyverse
, and visualize it with ggplot2
…
“Terry loves ggplot2!”
Let’s get started!
Packages
pacman::p_load(tidyverse, rvest, glue, cowplot, ggbeeswarm, polite, extrafont, knitr, kableExtra) loadfonts() # "Univers" and "Univers LT 93 ExtraBlackEx" fonts
B99 Custom Theme
First, I created a custom Brooklyn Nine-Nine theme that I can put
on every plot. This will save me time from typing in
the same options over and over again! I googled the font type that the
official Brooklyn Nine-Nine media uses, downloaded them, and got it
installed for R using the extrafont
package. For some of the different
colors you’ll see in the plots I sourced them from pasting in the
Brooklyn Nine-Nine logo and other official media images into
imagecolorpicker.com and saving the
hex codes that it gave me. Otherwise, I experimented with different
palettes from perusing Emil
Hvitfeld’s awesome
r-color-palettes
Github repository (Also special thanks to David
Smale for giving me some advice on
using color!).
theme_b99 <- function(){ base_size <- 11 half_line <- base_size / 2 theme_minimal() %+replace% theme(text = element_text(family = "Univers", color = "#F9FEFF", face = "plain", size = 14, hjust = 0.5, vjust = 0.5, angle = 0, lineheight = 0.9, margin = margin(half_line, half_line, half_line, half_line), debug = FALSE), plot.title = element_text(family = "Univers LT 93 ExtraBlackEx", size = 20, color = "#F9FEFF"), plot.background = element_rect(color = NA, fill = "#0053CD"), panel.background = element_rect(color = NA, fill = "#0053CD"), # axis options axis.text = element_text(family = "Univers", color = "#F9FEFF", size = 12), axis.title = element_text(size = 14), # legend options (for ratings plot) legend.title = element_text(family = "Univers", color = "#F9FEFF"), legend.text = element_text(family = "Univers", color = "#F9FEFF", size = 9), legend.position = "bottom", legend.key = element_rect(colour = "black", linetype = "solid", size = 1.5), legend.background = element_rect(color = "black", fill = "#0053CD", linetype = "solid")) }
Now the plots will all look very similar, just like a good police squad:
With that done, I can start making plots!
Episode Ratings
As in my more
recent blog
posts I used the polite
package to web scrape responsibly (note
the bow()
and scrape()
functions).
I map_2()
over the IMDB page for the episodes for each season and I
append the season number to each episode row. After I’m done with the
web scraping I mutate()
in the episode number for each season.
url_df <- tibble( urls = c("https://www.imdb.com/title/tt2467372/episodes?season=1", "https://www.imdb.com/title/tt2467372/episodes?season=2", "https://www.imdb.com/title/tt2467372/episodes?season=3", "https://www.imdb.com/title/tt2467372/episodes?season=4", "https://www.imdb.com/title/tt2467372/episodes?season=5"), season_num = c(1, 2, 3, 4, 5)) # scraping function: brooklyn99_ep_rating <- function(url) { session <- bow(url) url2 <- scrape(session) # Grab episode names sX_ep_name <- url2 %>% html_nodes(".info a") %>% html_text() %>% as_tibble() %>% mutate(value = gsub("\\n", "", x = value)) # Grab episode rating sX_rate <- url2 %>% html_nodes(".ipl-rating-widget > .ipl-rating-star .ipl-rating-star__rating") %>% html_text() %>% as_tibble() %>% mutate(rating = gsub("\\n", "", x = value) %>% as.numeric) %>% select(-value) # Clean episode name df sX_ep_name <- sX_ep_name %>% mutate(title = trimws(value)) %>% filter(!str_detect(title, "Rate"), title != "") %>% select(-value) # combine name + rating ep_rating <- sX_ep_name %>% bind_cols(sX_rate) } ep_rating_df <- map2(.x = url_df$urls, .y = url_df$season_num, ~ brooklyn99_ep_rating(url = .x) %>% mutate(season = .y)) %>% reduce(bind_rows) ep_rating_df <- ep_rating_df %>% group_by(season) %>% mutate(ep_num = row_number()) %>% ungroup() glimpse(ep_rating_df) ## Observations: 112 ## Variables: 4 ## $ title <chr> "Pilot", "The Tagger", "The Slump", "M.E. Time", "The V... ## $ rating <dbl> 7.9, 7.7, 7.7, 7.8, 8.1, 8.5, 8.2, 7.9, 7.8, 8.3, 8.3, ... ## $ season <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1... ## $ ep_num <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ...
OK, looks good.
Episode Ratings Plot: Heatmap and Boxplot
I used geom_tile()
to create a heat map of episode ratings with the
season number as the rows and the episode number for that season as the
columns. I also used the dichromat
package for the color scheme
“LightBluetoDarkBlue.10”, it meshes pretty well with the Brooklyn Nine-Nine
color theme!
rating_plot <- ep_rating_df %>% ggplot(aes(x = ep_num, y = season)) + geom_tile(aes(fill = rating), size = 1.5, color = "black") + scale_fill_gradientn("Rating", colors = dichromat::colorschemes$LightBluetoDarkBlue.10, breaks = c(7.7, 8, 8.5, 9, 9.5)) + guides(fill = guide_colourbar(frame.colour = "black", barwidth = unit(2, "in"))) + scale_x_continuous(expand = c(0.01, 0.01), breaks = c(1, seq(from = 5, to = 20, by = 5), 23), labels = c(1, seq(from = 5, to = 20, by = 5), 23)) + scale_y_reverse(expand = c(0.01, 0.01), breaks = c(1:7), labels = c(1:7)) + labs(x = "Episode Number", y = "Season", title = "Episode Ratings") + theme_b99() + theme(panel.grid = element_blank()) rating_plot
From what we can see on the graph there doesn’t seem to be any trends along either axis besides the fact that the finales and the subsequent opening episode of the following season are generally well-received. Two stand out episodes from Season 5 can be clearly seen in the plot. These episodes are “HalloVeen” (the fifth Halloween episode of the series) and “The Box” (an excellent episode featuring Sterling K. Brown as the perp).
Box Plot
Another way to visualize this would be with the box plot. This way allows you to make a comparison between seasons, whereas the heat map could be used to pick out any trends over time or sequence.
The debate between violin plot vs. box plot vs. sina plot vs. etc. rages on to this day (some examples I’ve read over the years: 1, 2, 3) and at the end of the day, it may come down to personal preference. Since the data I’m using is quite small (~20 episode ratings for each season), in my case it might be better to use box plots (instead of violin plots) and sprinkle bee swarm points on top with the ggbeeswarm package. I commented out some of the other methods if you wanted to copy-paste this code chunk into your R console to try them out!
I highlighted the outliers for each season in red with the unintentional result being that the color scheme mimics that of the flag of Colorado…
cols <- c("1" = "#6CA9C3", "2" = "#3A3533", "3" = "#000E33", "4" = "#CBCFD2", "5" = "#175E78") box_plot <- ep_rating_df %>% ggplot(aes(x = season, y = rating, group = season)) + #geom_violin(color = "#F9FEFF", fill = "#000E33") + #ggthemes::geom_tufteboxplot(color = "#F9FEFF", fill = "#000E33") + #ggforce::geom_sina(color = "#F9FEFF", fill = "#000E33") + geom_boxplot(color = "#F9FEFF", fill = "#000E33", outlier.color = "red", outlier.size = 5) + geom_beeswarm(color = "#FCF40E", cex = 2, size = 2.25) + scale_y_continuous(limits = c(7.5, 9.6), labels = c(7.5, 8, 8.5, 9, 9.5)) + labs(title = "Episode Ratings by Season", x = "Season", y = "Rating") + theme_b99() + theme(panel.grid.minor = element_blank()) box_plot
Noice, Smort! We get the best of both worlds by combining two different types of visualizing distributions. Let’s check out which episodes those red dots are…
ep_rating_df %>% group_by(season) %>% mutate(n = n(), third_quantile = quantile(rating)[4]) %>% filter(rating > third_quantile + 1.58 * (IQR(rating))) %>% select(-n, -third_quantile) ## # A tibble: 3 x 4 ## # Groups: season [2] ## title rating season ep_num ## <chr> <dbl> <dbl> <int> ## 1 Johnny and Dora 9.1 2 23 ## 2 HalloVeen 9.5 5 4 ## 3 The Box 9.4 5 14
Besides the (re)appearances of Doug Judy, Brooklyn Nine-Nine is known
for its Halloween Episodes. I wanted to use gt
somewhere in this blog
post but that’s little bit overkill so a kable
table will suffice for
now.
ep_rating_df %>% group_by(season) %>% top_n(n = 5, wt = rating) %>% arrange(desc(rating)) %>% mutate(rank = row_number()) %>% arrange(season, desc(rating)) %>% filter(str_detect(title, "Hallo")) %>% rename(Title = title, Rating = rating, Season = season, `Episode Number` = ep_num, Rank = rank) %>% kable(format = "html", caption = "The Halloween Episodes and Rank in Season") %>% kable_styling(full_width = FALSE)
Title | Rating | Season | Episode Number | Rank |
---|---|---|---|---|
Halloween | 8.5 | 1 | 6 | 4 |
Halloween II | 8.7 | 2 | 4 | 2 |
Halloween III | 8.9 | 3 | 5 | 2 |
Halloween IV | 8.7 | 4 | 5 | 2 |
HalloVeen | 9.5 | 5 | 4 | 1 |
It is very clear that the Halloween episodes are highly regarded to those that rate Brooklyn Nine-Nine on IMDB (although I’m sure most fans including myself will agree).
A more recent variant for showing a distribution is the ridge-line plot, courtesy of the ggridges package:
ep_rating_df2 <- ep_rating_df %>% mutate(season = as_factor(as.character(season))) ep_rating_df2 %>% ggplot(aes(x = rating, y = season, height = ..density..)) + ggridges::geom_density_ridges(color = "#F9FEFF", fill = "#000E33") + labs(title = "Episode Ratings by Season", x = "Rating", y = "Season") + scale_x_continuous(limits = c(7.25, 9.8), breaks = c(7.5, 8, 8.5, 9, 9.5), labels = c(7.5, 8, 8.5, 9, 9.5)) + scale_y_discrete(limits = rev(levels(ep_rating_df2$season))) + theme_b99() + theme(panel.grid.minor = element_blank()) ## Picking joint bandwidth of 0.155
Well OK, we’re just looking at distributions… but I wanted to use this GIF in some capacity!
Cast Appearances
The main cast of Brooklyn Nine-Nine are pretty tightly knit and as members of the same precinct it makes sense that they’ll generally appear together. So instead, I wanted to look at which non-main cast members and guests made the most appearances on the show. Special note: Hitchcock and Scully weren’t officially “main cast” until Season 2 but I left them out of the non-main cast list.
After scraping for the full cast list and main cast list, I anti_join()
them
so I am left with the non-main cast and the number of episodes that they
appeared in.
# Entire cast: cast_url <- bow("https://www.imdb.com/title/tt2467372/fullcredits?ref_=tt_cl_sm#cast") cast_info_raw <- scrape(cast_url) %>% html_nodes(".character") %>% html_text() %>% as_tibble() cast_info_clean <- cast_info_raw %>% separate(value, into = c("blank", "name", "episode_num", "dots"), sep = "\n") %>% mutate(episode_num = case_when( episode_num == " / ... " ~ dots, TRUE ~ episode_num), episode_num = episode_num %>% word(., 1, sep = "e") %>% as.numeric, name = str_trim(name, side = "both")) %>% select(-blank, -dots) %>% mutate(name = case_when( name == "Captain Ray Holt" ~ "Ray Holt", name == "Scully" ~ "Norm Scully", name == "Hitchcock" ~ "Michael Hitchcock", name == "Deputy Chief Madeline Wuntch" ~ "Madeline Wuntch", TRUE ~ name)) # Main cast: cast_main_url <- bow("https://en.wikipedia.org/wiki/List_of_Brooklyn_Nine-Nine_characters") cast_main_raw <- scrape(cast_main_url) %>% html_nodes(".wikitable") %>% html_table(fill = TRUE) %>% flatten_df() %>% as_tibble() cast_main_clean <- cast_main_raw %>% slice(-1) %>% select(Character) # anti-join non_main_cast <- anti_join(cast_info_clean, cast_main_clean, by = c("name" = "Character"))
Non-Main Cast Appearances Plot
To shorten the list I’ll just look at the top five cast members. I
created a halfway
variable so that the number labels will appear right
in the middle of each bar. Using the axis_canvas()
, draw_image()
,
and insert_axis_grob()
from the cowplot
package I can insert images
of the characters along the bottom of the plot.
non_main_plot <- non_main_cast %>% arrange(desc(episode_num)) %>% head(5) %>% mutate(halfway = episode_num / 2) %>% ggplot(aes(x = reorder(name, desc(episode_num)), y = episode_num)) + geom_col(fill = "#000E33") + geom_text(aes(y = halfway, label = episode_num, family = "Univers"), color = "#F9FEFF", size = 8) + scale_y_continuous(expand = c(0, 0), limits = c(0, 12.5), breaks = c(2, 4, 6, 8, 10, 12), labels = c(2, 4, 6, 8, 10, 12)) + labs(title = "Most Appearances by Non-Main Cast", x = NULL, y = "Number of Episodes") + theme_b99() + theme(panel.grid.major.x = element_blank()) # images pimage <- axis_canvas(non_main_plot, axis = 'x') + draw_image( "https://vignette.wikia.nocookie.net/tvdatabase/images/d/d5/Adrian_Pimento.jpg", x = 0.5, scale = 1.3, clip = "on") + draw_image( "https://vignette.wikia.nocookie.net/brooklynnine-nine/images/a/ab/Kevin.jpg", x = 1.5, scale = 1.3, clip = "on") + draw_image( "https://vignette.wikia.nocookie.net/brooklynnine-nine/images/b/b3/Wuntch.png", x = 2.5, scale = 1.3, clip = "on") + draw_image( "https://vignette.wikia.nocookie.net/brooklynnine-nine/images/0/0a/Doug_Judy.png", x = 3.5, scale = 1.3, clip = "on") + draw_image( "https://vignette.wikia.nocookie.net/brooklynnine-nine/images/2/23/Vulture.jpg", x = 4.5, scale = 1.3, clip = "on") # insert the image strip into the bar plot and draw ncast_plot <- ggdraw(insert_xaxis_grob(non_main_plot, pimage, position = "bottom")) ncast_plot
Adrian has appeared in the most episodes just beating out Kevin. This really shows how involved Adrian was in the story in Season 4 and 5 especially compared to Kevin who has been popping in and out since the first season. Everybody’s favorite DOUG JUDY rounds off this bar chart next to The Vulture.
Viewer Numbers
Pretty much the same M.O. as what I did to get the episode ratings here. One thing of note was using regex to get rid of the footnotes. I had to be careful to double escape the square brackets there.
url_wiki_df <- tibble( urls = c("https://en.wikipedia.org/wiki/Brooklyn_Nine-Nine_(season_1)", "https://en.wikipedia.org/wiki/Brooklyn_Nine-Nine_(season_2)", "https://en.wikipedia.org/wiki/Brooklyn_Nine-Nine_(season_3)", "https://en.wikipedia.org/wiki/Brooklyn_Nine-Nine_(season_4)", "https://en.wikipedia.org/wiki/Brooklyn_Nine-Nine_(season_5)"), season_num = c(1, 2, 3, 4, 5)) brooklyn99_ep_info <- function(url) { session <- bow(url) episode_raw <- scrape(session) %>% html_nodes(".wikiepisodetable") %>% html_table(fill = TRUE) %>% flatten_df() %>% as_tibble() %>% filter(row_number() %% 2 != 0) episode_table <- episode_raw %>% set_names(c("num_overall", "num_season", "title", "director", "writer", "air_date", "prod_code", "viewers")) %>% mutate(viewers = str_remove_all(viewers, "\\[[0-9]+\\]") %>% as.numeric) } ep_info_df <- map2(.x = url_wiki_df$urls, .y = url_wiki_df$season_num, ~ brooklyn99_ep_info(url = .x) %>% mutate(season = .y)) %>% reduce(rbind) ep_info_df %>% slice(77:83) ## # A tibble: 7 x 9 ## num_overall num_season title director writer air_date prod_code viewers ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> ## 1 77 9 "\"T~ Dean Ho~ Luke ~ Decembe~ 409 2.31 ## 2 78 10 "\"C~ Jaffar ~ Matt ~ Decembe~ 410 2.15 ## 3 7980 1112 "\"T~ Rebecca~ Carol~ January~ 411412 3.49 ## 4 81 13 "\"T~ Beth Mc~ Carly~ April 1~ 413 1.91 ## 5 82 14 "\"S~ Michael~ Andre~ April 1~ 414 1.91 ## 6 83 15 "\"T~ Linda M~ David~ April 2~ 415 1.88 ## 7 84 16 "\"M~ Maggie ~ Phil ~ May 2, ~ 418 1.72 ## # ... with 1 more variable: season <dbl>
tidylog Demonstration
A little intermission here as I wanted to talk about this cool package I recently found on Twitter called tidylog. To show you an example, I will take a look at which writers wrote the most episodes in Season 5:
library(tidylog) ## ## Attaching package: 'tidylog' ## The following objects are masked from 'package:dplyr': ## ## anti_join, distinct, filter, filter_all, filter_at, filter_if, ## full_join, group_by, group_by_all, group_by_at, group_by_if, ## inner_join, left_join, mutate, mutate_all, mutate_at, ## mutate_if, right_join, select, select_all, select_at, ## select_if, semi_join, transmute, transmute_all, transmute_at, ## transmute_if ## The following object is masked from 'package:stats': ## ## filter ep_info_df %>% select(season, num_season, title, num_overall, writer, viewers) %>% filter(season == 5) %>% separate_rows(writer, sep = "&") %>% mutate(writer = writer %>% trimws) %>% group_by(writer) %>% summarize(n = n()) %>% arrange(desc(n)) %>% head(5) ## select: dropped 3 variables (director, air_date, prod_code) ## filter: removed 89 out of 111 rows (80%) ## mutate: changed 6 values (24%) of 'writer' (0 new NA) ## group_by: 16 groups (writer) ## # A tibble: 5 x 2 ## writer n ## <chr> <int> ## 1 Luke Del Tredici 3 ## 2 Carly Hallam Tosh 2 ## 3 Carol Kolb 2 ## 4 Dan Goor 2 ## 5 David Phillips 2
You can see that it gives you details on what each dplyr
function
changed in your data frame in the order that the operations were
performed. It’s pretty cool and it gets more useful as your dplyr
pipeline grows longer! However, let’s turn tidylog
off for now:
options("tidylog.display" = list())
The episode info data frame we got from Wikipedia had a lot of information so let’s chop it down a bit to get data on the number of viewers per episode.
From what we saw of ep_info_df
earlier, the data looked pretty clean
already except for Episode 79 and 80 (“The Fugitive” episodes)…
There are other multi-part episodes throughout the show but this pair is
the only one that aired on the same day, on New Years Day 2017, so they
got smushed together in the Wikipedia table when we scraped it. There
wasn’t a quick and easy way to regex
them into separate rows so I just
filtered them out and added them back in. Then at the end of the pipe, I
created a first
and last
variable for each season taking note of what
overall episode number the first and last episodes of each season were.
You’ll see why I did this soon.
viewers_df <- ep_info_df %>% select(season, num_season, title, num_overall, viewers) %>% # manually fix "The Fugitive" episodes filter(num_overall != 7980) %>% add_row(season = 4, num_season = 11, title = "The Fugitive: Part 1", num_overall = 79, viewers = 3.49) %>% add_row(season = 4, num_season = 12, title = "The Fugitive: Part 2", num_overall = 80, viewers = 3.49) %>% mutate(num_overall = as.numeric(num_overall), num_season = as.numeric(num_season), season = as_factor(as.character(season))) %>% arrange(num_overall) %>% group_by(season) %>% mutate(first = first(num_overall), last = last(num_overall)) %>% ungroup()
Viewers Plot
For this plot I used geom_rect()
to give a colored background for each
season. I set the span of the Xs as the first
and last
episode
number of each season for the width and using Inf
and -Inf
for the
Ys to cover the entire height of the plot. For the season labels I
calculated the midpoint along the x-axis for each season and used that
as the x values (not shown in the code). I could have used facets but to
the best of my knowledge I wouldn’t have been able to place the labels
on top across multiple facets anyways. It may be possible by placing
text grobs on top of the facetted plot but I thought that would take
more time compared to what I did with geom_rect()
. Otherwise I used a
lot of annotate()
code to add in the episode details and season
titles.
# Color hex codes for season blocks: cols <- c("1" = "#6CA9C3", "2" = "#3A3533", "3" = "#000E33", "4" = "#CBCFD2", "5" = "#175E78") viewer_plot <- viewers_df %>% ggplot(aes(x = num_overall, y = viewers, group = season)) + geom_rect(aes(xmin = first, xmax = last, fill = season), alpha = 0.2, ymin = -Inf, ymax = Inf) + geom_line(color = "white", size = 1.1) + scale_fill_manual("Season", values = cols, guide = FALSE) + scale_y_continuous(limits = c(0, 20), breaks = c(seq(0, 15, by = 2.5)), expand = c(0, 0)) + scale_x_continuous(breaks = c(seq(0, 112, by = 5)), labels = c(seq(0, 112, by = 5)), expand = c(0, 0)) + labs(x = "Episode Number (Overall)", y = "Viewers (Millions)", title = "Viewers by Episode") + theme_b99() + theme(panel.grid = element_blank()) + # Line segments annotate(geom = "segment", x = 15, xend = 15, y = 15, yend = 16., color = "black", size = 1.2) + annotate(geom = "segment", x = 34, xend = 34, y = 6, yend = 7, color = "black", size = 1.2) + annotate(geom = "segment", x = 79.5, xend = 79.5, y = 3.5, yend = 5, color = "black", size = 1.2) + # Episode details annotate(geom = "label", x = 7, y = 16.5, hjust = 0, size = 4.5, family = "Univers", label = "Operation: Broken Feather (Aired after Super Bowl XLVIII)") + annotate(geom = "label", x = 25, y = 7.5, hjust = 0, size = 4.5, family = "Univers", label = "Beach House (Sunday before First Workday of 2015)") + annotate(geom = "label", x = 67, y = 5, hjust = 0, size = 4.5, family = "Univers", label = "The Fugitive: Part 1 & 2 (New Years Day 2017)") + # Season titles annotate(geom = "label", x = 11.5, y = 19, size = 7, family = "Univers", label = "Season One") + annotate(geom = "label", x = 34, y = 19, size = 7, family = "Univers", label = "Season Two") + annotate(geom = "label", x = 57, y = 19, size = 7, family = "Univers", label = "Season Three") + annotate(geom = "label", x = 79.5, y = 19, size = 7, family = "Univers", label = "Season Four") + annotate(geom = "label", x = 102, y = 19, size = 7, family = "Univers", label = "Season Five") viewer_plot
I added in details on why certain episodes had such high viewership compared to others from both my memory and a little digging but I wasn’t really sure about those three peaks in Season 3. One of them was a Halloween episode (which is up there with the Pontiac Bandit episodes for the best recurring episodes), but in terms of the actual airing dates I can’t quite remember anything significant happening on those days that would explain the high viewer numbers (relative to the other episodes in the season).
Throughout it’s run on FOX, the show averaged around 2~3.5 million viewers with numerous peaks and troughs until Season 5 where it only averaged around 1.8 million viewers but kept those numbers very consistently across the entire season.
Creating the Title Card
The last thing I want to do to wrap up this blog post is to create an infographic containing the plots I made above. Before I combine them all I want to make a title card that looks similar to the official Brooklyn Nine-Nine title card you see at the beginning of the episode. Here’s what it looks like:
Now I could’ve just added the picture into my graph but I
thought why not try making it myself? So, I used geom_polygon()
to
construct the two diagonal colored bars and then placed the text on top
of them with annotate()
.
rect <- data.frame(id = as.factor(c(1, 1, 1, 1, 2, 2, 2, 2)), x = c(16, 16, 0, 0, 16, 16, 6.6, 6.5), y = c(5.1, 2, 0.4, 3.5, 2.5, 0.8, 0, 1.6)) header <- ggplot(data = rect, aes(x, y, group = id, fill = id)) + geom_polygon() + scale_fill_manual(values = c("#0053CD", "#000E33"), guide = FALSE) + annotate(geom = "text", y = 2, x = 1.8, hjust = 0, family = "Univers LT 93 ExtraBlackEx", color = "#fcf7e8", size = 40, angle = 5, label = glue(" BROOKLYN")) + annotate(geom = "text", y = 0.9, x = 7.2, hjust = 0, family = "Univers", color = "#fcf7e8", size = 28, angle = 3.5, label = glue(" NINE-NINE")) + scale_x_continuous(expand = c(0, 0), limits = c(0, 16)) + scale_y_continuous(expand = c(0, 0), limits = c(0, 5.5)) + theme_void() + theme(plot.background = element_rect(color = "#F9FEFF", fill = "#F9FEFF", size = 8)) header
After going through this process (read: very time-consuming ordeal) I don’t recommend making stuff like this manually as it can be quite frustrating, especially if you have to get it to fit alongside other plots and grobs!
Finally, let me just add a footer:
foot <- ggplot() + annotate(geom = "text", x = 12.5, y = 0.5, size = 4, family = "Univers", color = 'white', hjust = 0, label= glue(" Source: Wikipedia, Nielsen Ratings, IMDB By @R_by_Ryo")) + lims(x = c(0, 16), y = c(0, 1)) + theme_b99() + theme(panel.grid = element_blank(), text = element_blank()) foot
Arrange Plots with cowplot
With all the components done, now I just need to arrange all of the plots, the title card, and the footer together to create the Brooklyn Nine-Nine infographic!
There are several ways to do this like using grid.arrange()
from the
grid
package or the newer patchwork
package but I went for
cowplot::plot_grid()
this time.
the_nine_nine_plot <- plot_grid( plot_grid(header), plot_grid(viewer_plot), plot_grid(box_plot, ncast_plot, rel_widths = c(3, 4)), plot_grid(foot), ncol = 1, rel_heights = c(1, 1, 1, 0.25)) the_nine_nine_plot
Conclusion
In this blog post I went over creating your own custom ggplot2
theme,
web scraping, cleaning and tidying data frames with the tidyverse
, and
creating a variety of plots to show simple statistics on Brooklyn
Nine-Nine. You can check out all the code in this GitHub
Repo.
The plots were pretty simple so the main problems I had were in sizing
and scaling stuff properly. I feel like some of it is still not perfect
but it’s good enough for now. This is an area I feel like I need to
improve on and find ways to automatically scale things using magick
or
custom functions. This was also my first time using a different color
scheme for my plots, usually I just slap theme_minimal()
to most
things and call it a day. It took me a while to get the color
coordination right and I think I got it to a level where it isn’t
nauseating to look at!
To conclude I just wanted to list out my top five “Cold Opens” (in no particular order):
Agree? Disagree? Let me know in the comments below!
I hope you had as much fun reading this as I did making it. If you didn’t, then well…
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.