Rafa 21 Grand Slams and gganimate
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve been a Nadal fan for a long time – right back to the days of the pirate-pants so yeah, really a long time. In all this time, Rafa has never been ahead in the grand slam race vs his biggest rivals… but that finally changed after the 2022 Australian Open! The win there was unexpected and came out of nowhere. The final against Medvedev has to go down as one of the best comebacks ever.
It’s already been as.Date("2022-02-13") - as.Date("2022-01-30")
(14 days) since he won that record 21st grand slam so I thought it has to be about time to do something to mark the achievement. Something that’s been on my list of things to learn is gganimate which is a very cool R package so I thought I’d take the opportunity here. My goal is to create an animated barplot, showing Rafa on top at the very end.
Getting the data
I started by using this data on grand slam wins from this blog post by Emily Kuehler and filtering for just the data on the big 3 male players: Nadal, Djokovic and Federer. Since the grand slam data there does not go all the way up to the 2022 Australian Open, I had to manually add that in by looking up the required information on Wikipedia and binding that to the end.
library(tidyverse) library(readxl) gs_df <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-09/grand_slams.csv", show_col_types = FALSE) gs_update <- tibble::tribble( ~year, ~grand_slam, ~name, ~rolling_win_count, ~tournament_date, 2019, "French Open", "Rafael Nadal", 18, "26/05/2019", 2019, "Wimbledon", "Novak Djokovic", 16, "01/07/2019", 2019, "US Open", "Rafael Nadal", 19, "26/08/2019", 2020, "Australian Open", "Novak Djokovic", 17, "20/01/2020", 2020, "French Open", "Rafael Nadal", 20, "27/09/2020", 2021, "Australian Open", "Novak Djokovic", 18, "08/02/2021", 2021, "French Open", "Novak Djokovic", 19, "30/05/2021", 2021, "Wimbledon", "Novak Djokovic", 20, "28/07/2021", 2022, "Australian Open", "Rafael Nadal", 21, "17/01/2022" ) %>% mutate(tournament_date = as.Date(tournament_date, "%d/%m/%Y")) gs_df2 <- gs_df %>% filter(name %in% c("Rafael Nadal", "Novak Djokovic", "Roger Federer")) %>% mutate(grand_slam = str_replace_all(grand_slam, "_", " "), grand_slam = str_to_title(grand_slam), grand_slam = str_replace_all(grand_slam, "Us", "US")) %>% select(-gender) %>% bind_rows(gs_update) gs_df2 # A tibble: 61 x 5 # year grand_slam name rolling_win_count tournament_date # <dbl> <chr> <chr> <dbl> <date> # 1 2003 Wimbledon Roger Federer 1 2003-07-14 # 2 2004 Australian Open Roger Federer 2 2004-01-10 # 3 2004 Wimbledon Roger Federer 3 2004-07-14 # 4 2004 US Open Roger Federer 4 2004-09-09 # 5 2005 French Open Rafael Nadal 1 2005-06-09 # 6 2005 Wimbledon Roger Federer 5 2005-07-14 # 7 2005 US Open Roger Federer 6 2005-09-09 # 8 2006 Australian Open Roger Federer 7 2006-01-10 # 9 2006 French Open Rafael Nadal 2 2006-06-09 # 10 2006 Wimbledon Roger Federer 8 2006-07-14 # ... with 51 more rows
Data processing
I had to do a bit of general data wrangling (isn’t this always the case unfortunately?) to set things up for gganimate. This is all fairly standard stuff so I’ll just show the code below but one thing to note is that for the rank order (current_rank
) of the players at each time point, I sorted ascending on rolling_win_count2
rather than descending (as would seem more logical to get the ranking by most slams) because when you use ggplot2::coord_flip()
, it puts the highest value (lowest rank) at the top of the graph – so essentially I set it up so that rank 3 is the best and rank 1 is the worst.
# Expand out for all combinations gs_df3 <- gs_df2 %>% arrange(tournament_date) %>% mutate(year = fct_inorder(factor(year)), grand_slam = factor(grand_slam, levels = c("Australian Open", "French Open", "Wimbledon", "US Open")), name = factor(name, levels = c("Rafael Nadal", "Novak Djokovic", "Roger Federer"))) %>% complete(year, grand_slam, name) %>% replace_na(list(rolling_win_count = 0)) # Drop tournaments before first slam win or not yet played gs_df4 <- gs_df3 %>% filter(!(year == 2003 & grand_slam %in% c("Australian Open", "French Open")), !(year == 2022 & grand_slam %in% c("French Open", "Wimbledon", "US Open"))) %>% mutate(yr_slam = paste(year, grand_slam), .before = year) # Recalculate rolling win count gs_df5 <- gs_df4 %>% mutate(win = as.numeric(rolling_win_count > 0)) %>% group_by(name) %>% mutate(rolling_win_count2 = cumsum(win)) %>% ungroup() # Set the rank for each time point gs_df6 <- gs_df5 %>% arrange(year, grand_slam, rolling_win_count2, desc(name)) %>% group_by(yr_slam) %>% mutate(current_rank = row_number()) %>% ungroup() select(gs_df6, yr_slam, name, rolling_win_count2, current_rank) # A tibble: 225 x 4 # yr_slam name rolling_win_count2 current_rank # <chr> <fct> <dbl> <int> # 1 2003 Wimbledon Novak Djokovic 0 1 # 2 2003 Wimbledon Rafael Nadal 0 2 # 3 2003 Wimbledon Roger Federer 1 3 # 4 2003 US Open Novak Djokovic 0 1 # 5 2003 US Open Rafael Nadal 0 2 # 6 2003 US Open Roger Federer 1 3 # 7 2004 Australian Open Novak Djokovic 0 1 # 8 2004 Australian Open Rafael Nadal 0 2 # 9 2004 Australian Open Roger Federer 2 3 # 10 2004 French Open Novak Djokovic 0 1 # ... with 215 more rows
ggimage
Next I prepared some cartoon faces for the 3 players to go on the ends of the bars and stored these on Github so they can be loaded into R with the help of the ggimage package. I won’t go into much detail on the image processing side but the online tools I used to help with this are all in the references section of this blog post.
library(ggimage) img_rafa <- "https://raw.githubusercontent.com/alan-y/img/master/rafa2.png" img_novak <- "https://raw.githubusercontent.com/alan-y/img/master/novak2.png" img_roger <- "https://raw.githubusercontent.com/alan-y/img/master/roger2.png" gs_df7 <- gs_df6 %>% mutate(img_player = case_when( name == "Rafael Nadal" ~ img_rafa, name == "Novak Djokovic" ~ img_novak, name == "Roger Federer" ~ img_roger, )) select(gs_df7, name, img_player) # A tibble: 225 x 2 # name img_player # <fct> <chr> # 1 Novak Djokovic https://raw.githubusercontent.com/alan-y/img/master/novak2.png # 2 Rafael Nadal https://raw.githubusercontent.com/alan-y/img/master/rafa2.png # 3 Roger Federer https://raw.githubusercontent.com/alan-y/img/master/roger2.png # 4 Novak Djokovic https://raw.githubusercontent.com/alan-y/img/master/novak2.png # 5 Rafael Nadal https://raw.githubusercontent.com/alan-y/img/master/rafa2.png # 6 Roger Federer https://raw.githubusercontent.com/alan-y/img/master/roger2.png # 7 Novak Djokovic https://raw.githubusercontent.com/alan-y/img/master/novak2.png # 8 Rafael Nadal https://raw.githubusercontent.com/alan-y/img/master/rafa2.png # 9 Roger Federer https://raw.githubusercontent.com/alan-y/img/master/roger2.png # 10 Novak Djokovic https://raw.githubusercontent.com/alan-y/img/master/novak2.png # ... with 215 more rows
gganimate
Firstly I’ll mention that for the gganimate package to work well, you’ll also need to install the gifski package. To get things ready for the animated plot in plot_df
, I just had to make a couple of minor manipulations. Then everything up to the use of transition_states()
in the code below is standard ggplot2 code except maybe the use of geom_image()
to add the cartoon faces to the end of the bars (note that I subtract 0.5 from rolling_win_count
as the function doesn’t seem to have a nudge_y
argument even though it has one for nudge_x
) and the use of {closest_state}
in the subtitle – this tracks the variable that the animation transitions over which for me, is yr_slam
, i.e. the combination of year and grand slam name. The fill colours are from scale_fill_hue()
but manually picked so that the fill colour of each player’s bar matches their favourite surface.
For transition_states()
, the transition_length()
is the relative length of the transition and state_length()
is the relative length of the pause at the states (I stole this from the help page); I set wrap = FALSE
as I don’t want the last state to transition into the first when looping the animation. I am not sure how much difference the ease_aes("quadratic-in-out")
makes here to be honest but that’s what I used. In general I know this function is for messing around with the effects applied to how frames/states transition into one another. If somebody can give me a good layman’s explanation of these functions, I’d be grateful if you can do so in the comments.
In the animate()
function, I set nframes = 500
. Some things to note here are that the default is only 50 frames so if you have more than 50 states (and I do as I have more than 50 year-slam combinations) then you need to set this to a larger number but this number should be suitably larger so the animation looks smoother. I set end_pause
to 30 frames so that it pauses at the end for a little bit. Finally I applied very specific width
and height
as I wanted to add something to the end of the animation which happened to have these dimensions – that’s a surprise which you will see!
library(gganimate) plot_df <- gs_df7 %>% select(-rolling_win_count) %>% rename(rolling_win_count = rolling_win_count2) %>% mutate(yr_slam = fct_inorder(factor(yr_slam)), name_count = paste(name, rolling_win_count)) # Set up ggplot2 theme theme_set(theme_minimal()) theme_update(plot.title = element_text(face = "bold", size = 18), plot.subtitle = element_text(size = 14), panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank(), panel.grid.major.x = element_line(color = "grey75"), panel.grid.minor.x = element_line(color = "grey75"), legend.position = "none", axis.ticks = element_blank(), axis.text.y = element_blank()) barplot_slams <- ggplot(plot_df, aes(x = current_rank, y = rolling_win_count, fill = name)) + geom_bar(stat = "identity", width = 0.3, colour = "black") + geom_text(aes(label = name_count), nudge_x = -0.25, nudge_y = -0.75, size = 3, fontface = "bold", hjust = 0) + geom_image(aes(image = img_player, y = rolling_win_count - 0.5), size = 0.09) + scale_fill_manual(values = c("#FF7969", "#569EFF", "#00B73A")) + scale_y_continuous(limits = c(-0.75, 25), breaks = seq(0, 25, by = 5)) + coord_flip() + labs(title = "Men's Tennis Grand Slam Singles Championships", subtitle = "{closest_state}", x = NULL, y = NULL) + transition_states(yr_slam, transition_length = 3, state_length = 1, wrap = FALSE) + ease_aes("quadratic-in-out") animate(barplot_slams, nframes = 500, end_pause = 30, fps = 20, width = 469, height = 334, renderer = gifski_renderer("barplot_slams.gif"))
So without further ado, here is the final result for your enjoyment.
References
https://www.archyworldys.com/before-talking-about-greatest-of-all-time-what-you-need-to-know-about-big-3-tennis
https://www.cutout.pro
https://www.eurosport.com/tennis/watch-historic-moment-rafael-nadal-wins-australian-open-and-claims-historic-21st-grand-slam-singles-_vid1618912/video.shtml
https://www.downloadhelper.net
https://online-video-cutter.com
https://ezgif.com
https://gif.ski
https://gifyu.com
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.