Tracking Joey Wendle’s rookie season with gganimate

[This article was first published on Rstats on goonR blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

This past weekend I got the chance to go to the Tampa Bay Rays vs Cleveland Indians game. This game was a ton of fun, made even more exciting for me – and by the end of the game, the people in my section – because my brother-in-law’s brother (does that make him my brother-in-law too? No one ever knows for sure…), Joey Wendle, plays for the Rays! In case you haven’t been paying attention, Joey has been having a MASSIVE rookie season. Just take a look at some of the tweets from the Ray’s organization in the last few days.

As you can see, the man has been getting it done at the plate and in the field. This has led some to call for his inclusion in the AL Rookie of the Year conversation.

I have been looking for a good excuse to play around with Thomas Pedersen’s gganimate package, and what better way then by taking a look at Joey’s rookie season.

Before we start, let’s load the R packages that we will use.

library(tidyverse)
# devtools::install_github("BillPetti/baseballr")
library(baseballr)
# devtools::install_github("thomasp85/gganimate")
library(gganimate)
library(lubridate)
library(fuzzyjoin) # for position graph

Getting the data

There are several sources of MLB data available on the internet, and the baseballr package by Bill Petti makes getting this data super easy. The sources of data that I will use in this post are from the Baseball Savant website for batting data and Fan Graphs for fielding position. These sites both provide game by game data for any player in the league for any season. The Baseball Savant site provides pitch by pitch data about each of Joey’s at-bats and the Fan Graphs site provides game by game stats.

baseballr provides different functions for the different data sources, but first we will have to get Joey’s player id from these sites. We can use the playerid_lookup function to find it.

playerid_lookup("Wendle", "Joey") %>% glimpse()
## Observations: 1
## Variables: 11
## $ first_name       <chr> "Joey"
## $ last_name        <chr> "Wendle"
## $ given_name       <chr> "Joseph Patrick"
## $ name_suffix      <chr> NA
## $ nick_name        <chr> NA
## $ birth_year       <int> 1990
## $ mlb_played_first <int> 2016
## $ mlbam_id         <int> 621563
## $ retrosheet_id    <chr> "wendj002"
## $ bbref_id         <chr> "wendljo01"
## $ fangraphs_id     <int> 13853

Now we can look up both the Fan Graphs and Baseball Savant data. From the above we can see that his Fan Graphs id is “13853”.

joey_fg <- baseballr::batter_game_logs_fg("13853", year = 2018)
joey_fg %>% as_tibble()
## # A tibble: 119 x 23
##    Date  Team  Opp   BO    Pos   PA    H     X2B   X3B   HR    R     RBI  
##    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
##  1 2018~ TBR   @TOR  4     2B    4     0     0     0     0     0     0    
##  2 2018~ TBR   @TOR  5     2B    4     0     0     0     0     0     0    
##  3 2018~ TBR   @TOR  5     SS-3B 4     3     1     0     0     1     0    
##  4 2018~ TBR   @CLE  1     3B    5     2     1     0     0     1     1    
##  5 2018~ TBR   @CLE  1     2B    5     2     1     0     0     1     0    
##  6 2018~ TBR   @CLE  1     2B    4     1     0     0     0     0     0    
##  7 2018~ TBR   @ATL  5     2B    5     2     0     0     0     2     2    
##  8 2018~ TBR   @ATL  1     2B    5     2     0     0     0     1     0    
##  9 2018~ TBR   BOS   1     2B    5     2     1     0     0     1     0    
## 10 2018~ TBR   BOS   1     2B    5     2     0     0     0     0     1    
## # ... with 109 more rows, and 11 more variables: SB <chr>, CS <chr>,
## #   BB_perc <dbl>, K_perc <dbl>, ISO <chr>, BABIP <chr>, AVG <chr>,
## #   OBP <chr>, SLG <chr>, wOBA <chr>, wRC_plus <chr>

We can do the same with the Baseball Savant data using the corresponding player id – “621563”

joey_bsvnt <- scrape_statcast_savant_batter(start_date = "2018-03-29", end_date = Sys.Date(), batterid = "621563")
joey_bsvnt %>% as_tibble()
## # A tibble: 1,687 x 90
##    pitch_type game_date  release_speed release_pos_x release_pos_z
##    <chr>      <date>             <dbl>         <dbl>         <dbl>
##  1 SI         2018-09-05          92.5         -2.44          5.48
##  2 SI         2018-09-05          93.2         -2.39          5.70
##  3 CH         2018-09-05          88.4         -2.03          5.53
##  4 CU         2018-09-05          78.5         -1.87          5.67
##  5 CU         2018-09-05          80.6         -1.98          5.63
##  6 CU         2018-09-05          79.3         -1.92          5.65
##  7 CU         2018-09-05          78.1         -2.02          5.69
##  8 FT         2018-09-05          92.3         -1.90          5.69
##  9 CH         2018-09-05          89.3         -1.92          5.57
## 10 FT         2018-09-05          93.9         -1.88          5.60
## # ... with 1,677 more rows, and 85 more variables: player_name <chr>,
## #   batter <int>, pitcher <int>, events <chr>, description <chr>,
## #   spin_dir <chr>, spin_rate_deprecated <chr>,
## #   break_angle_deprecated <chr>, break_length_deprecated <chr>,
## #   zone <int>, des <chr>, game_type <chr>, stand <chr>, p_throws <chr>,
## #   home_team <chr>, away_team <chr>, type <chr>, hit_location <int>,
## #   bb_type <chr>, balls <int>, strikes <int>, game_year <int>,
## #   pfx_x <dbl>, pfx_z <dbl>, plate_x <dbl>, plate_z <dbl>, on_3b <dbl>,
## #   on_2b <dbl>, on_1b <dbl>, outs_when_up <int>, inning <int>,
## #   inning_topbot <chr>, hc_x <dbl>, hc_y <dbl>, tfs_deprecated <chr>,
## #   tfs_zulu_deprecated <chr>, pos2_person_id <int>, umpire <chr>,
## #   sv_id <chr>, vx0 <dbl>, vy0 <dbl>, vz0 <dbl>, ax <dbl>, ay <dbl>,
## #   az <dbl>, sz_top <dbl>, sz_bot <dbl>, hit_distance_sc <int>,
## #   launch_speed <dbl>, launch_angle <dbl>, effective_speed <dbl>,
## #   release_spin_rate <int>, release_extension <dbl>, game_pk <int>,
## #   pos1_person_id <int>, pos2_person_id_1 <int>, pos3_person_id <int>,
## #   pos4_person_id <int>, pos5_person_id <int>, pos6_person_id <int>,
## #   pos7_person_id <int>, pos8_person_id <int>, pos9_person_id <int>,
## #   release_pos_y <dbl>, estimated_ba_using_speedangle <dbl>,
## #   estimated_woba_using_speedangle <dbl>, woba_value <dbl>,
## #   woba_denom <int>, babip_value <int>, iso_value <int>,
## #   launch_speed_angle <int>, at_bat_number <int>, pitch_number <int>,
## #   pitch_name <chr>, home_score <int>, away_score <int>, bat_score <int>,
## #   fld_score <int>, post_away_score <int>, post_home_score <int>,
## #   post_bat_score <int>, post_fld_score <int>,
## #   if_fielding_alignment <chr>, of_fielding_alignment <chr>, barrel <dbl>

While the baseball savant data has a ton of information, it does not have stats per game or cumulative stats for the season. However, using the data that we have, these things can be calculated easily! To do this, we will define a custom function to calculate a few standard sabermetrics (i.e. batting average, slugging, on base percentage, etc.).

savant_daily <- function(data){
  output <- data %>% 
    filter(!is.na(events), events != "caught_stealing_2b") %>% 
    mutate(
      is_ab = if_else(
        events %in% c("strikeout", "field_out", "single", "double", "force_out", 
                      "triple", "home_run", "double_play", "field_error", "grounded_into_double_play",
                      "strikeout_double_play", "fielders_choice_out"),
        TRUE,
        FALSE
      ),
      is_hit = if_else(
        events %in% c("single", "double", "triple", "home_run"), TRUE, FALSE
      ),
      bases = case_when(
        events == "single" ~ 1,
        events == "double" ~ 2,
        events == "triple" ~ 3,
        events == "home_run" ~ 4,
        TRUE ~ 0
      ),
      event_custom = case_when(
        events %in% c("single", "double", "triple", "home_run") ~ events, 
        str_detect(events, "sac") ~ "sacrifice",
        events %in% c("walk", "hit_by_pitch") ~ NA_character_,
        events == "field_error" ~ "error",
        TRUE ~ "out"
      )
    ) %>% 
    group_by(game_date) %>% 
    summarize(
      pa = length(unique(at_bat_number)),
      ab = sum(is_ab),
      hits = sum(is_hit),
      doubles = sum(events == "double"),
      triples = sum(events == "triples"),
      home_runs = sum(events == "home_run"),
      bb = sum(events == "walk"),
      hbp = sum(events == "hit_by_pitch"),
      so = sum(events %in% c("strikeout", "strikeout_double_play")),
      bases = sum(bases)
    ) %>% 
    arrange(game_date) %>% 
    mutate(
      ba = round(hits/ab, 3),
      obp = round((hits + bb + hbp)/(ab + bb + hbp), 3),
      slg = round(bases/ab, 3),
      ops = obp + slg,
      hits_to_date = cumsum(hits),
      bb_to_date = cumsum(bb),
      hbp_to_date = cumsum(hbp),
      ab_to_date = cumsum(ab),
      bases_to_date = cumsum(bases),
      ba_to_date = round(hits_to_date/ab_to_date, 3),
      obp_to_date = round(
        (hits_to_date + bb_to_date + hbp_to_date)/(ab_to_date + bb_to_date + hbp_to_date), 3
      ),
      slg_to_date = round(bases_to_date/ab_to_date, 3),
      ops_to_date = obp_to_date + slg_to_date
    )
  
  return(output)
}
joey_bsvnt_daily <- savant_daily(joey_bsvnt)

joey_bsvnt_daily %>% glimpse()
## Observations: 117
## Variables: 24
## $ game_date     <date> 2018-03-29, 2018-03-30, 2018-03-31, 2018-04-01,...
## $ pa            <int> 1, 1, 4, 4, 3, 1, 3, 2, 5, 6, 4, 4, 4, 4, 4, 2, ...
## $ ab            <int> 1, 1, 3, 4, 3, 1, 3, 2, 5, 3, 3, 4, 3, 2, 3, 2, ...
## $ hits          <int> 0, 0, 0, 2, 0, 1, 2, 1, 2, 1, 0, 0, 1, 0, 0, 1, ...
## $ doubles       <int> 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, ...
## $ triples       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ home_runs     <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, ...
## $ bb            <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 0, 1, 1, 1, 0, ...
## $ hbp           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...
## $ so            <int> 0, 0, 2, 0, 0, 0, 1, 0, 1, 2, 1, 2, 0, 1, 0, 1, ...
## $ bases         <dbl> 0, 0, 0, 3, 0, 1, 3, 3, 5, 1, 0, 0, 2, 0, 0, 1, ...
## $ ba            <dbl> 0.000, 0.000, 0.000, 0.500, 0.000, 1.000, 0.667,...
## $ obp           <dbl> 0.000, 0.000, 0.000, 0.500, 0.000, 1.000, 0.667,...
## $ slg           <dbl> 0.000, 0.000, 0.000, 0.750, 0.000, 1.000, 1.000,...
## $ ops           <dbl> 0.000, 0.000, 0.000, 1.250, 0.000, 2.000, 1.667,...
## $ hits_to_date  <int> 0, 0, 0, 2, 2, 3, 5, 6, 8, 9, 9, 9, 10, 10, 10, ...
## $ bb_to_date    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 4, 4, 5, 6, 7, 7, ...
## $ hbp_to_date   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, ...
## $ ab_to_date    <int> 1, 2, 5, 9, 12, 13, 16, 18, 23, 26, 29, 33, 36, ...
## $ bases_to_date <dbl> 0, 0, 0, 3, 3, 4, 7, 10, 15, 16, 16, 16, 18, 18,...
## $ ba_to_date    <dbl> 0.000, 0.000, 0.000, 0.222, 0.167, 0.231, 0.312,...
## $ obp_to_date   <dbl> 0.000, 0.000, 0.000, 0.222, 0.167, 0.231, 0.312,...
## $ slg_to_date   <dbl> 0.000, 0.000, 0.000, 0.333, 0.250, 0.308, 0.438,...
## $ ops_to_date   <dbl> 0.000, 0.000, 0.000, 0.555, 0.417, 0.539, 0.750,...

Fielding Positions

Joey started the season sharing second base duties and was not used in many other positions. However, as the season has progressed, the Rays have given him opportunities at several other positions and Joey has stepped up every time. I mean, just look at those web gems in the tweet at the beginning of this post! Seriously, go watch them again, I will wait…

As a first stab at gganimate, let’s take a look at where Joey has played each game with the help of baseballr’s ggspraychart function. First, we can define coordinates (relative to ggspraycharts field definition) for each of the positions that he has played.

position_loc <- tribble(
  ~position, ~x, ~y, 
  "2B", 145, -145,
  "3B", 100, -160,
  "SS", 111, -144,
  "LF", 80, -100,
  "RF", 176, -100
)

Since Joey has played multiple positions in single games, a simple left join is not entirely appropriate. Using the fuzzyjoin package, we can match the position coordinates with the positions that are defined in the joey_fg data set by regex.

joey_position <- joey_fg %>% 
  fuzzyjoin::regex_left_join(position_loc, by = c("Pos" = "position")) %>% 
  mutate(Date = lubridate::ymd(Date))

Now, using the ggspraychart and the transition_time function, we can see how his position has changed with each day.

position_chart <- joey_position %>% 
  filter(!is.na(position)) %>% 
  ggspraychart(x_value = "x", y_value = "y", fill_value = "position") + 
  scale_fill_manual(values = c("2B" = "red", "3B" = "blue",
                               "SS" = "green", "LF" = "yellow", 
                               "RF" = "black"), na.translate = FALSE) + 
  labs(title = "Joey Wendle 2018 position by game:{frame_time}",
       caption = "Data source: fangraphs.com\nBuilt with the baseballr package\n") + 
  transition_time(Date) + 
  theme(
    legend.title = element_blank(),
    plot.title = element_text(size = 14),
    plot.caption = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 12, face = "bold")
  )

animate(position_chart, width = 800, height = 800)

As this animation shows, he has been moved around the field much more towards the end of the year!

Batting

Now let’s use the ggspraychart to see how Joey’s hits have accumulated over the course of the season and where he tends to hit the ball. For this, we will use the raw baseball savant data set joey_bsvnt and use the function transition_reveal. This function allows for data to be added over a range rather than only showing the data for a specific date.

joey_spray <- joey_bsvnt %>% 
  filter(events %in% c("single", "double", "triple", "home_run")) %>% 
  mutate(events = factor(events, levels = c("single", "double", "triple", "home_run"),
                         labels = c("Single", "Double", "Triple", "Home Run"))) %>% 
  ggspraychart(fill_value = "events", 
               fill_palette = c("Single"="#A2C8EC", "Double"="#006BA4", 
                                "Triple"="#FF940E", "Home Run"="#C85200")) + 
  labs(title = "Joey Wendle 2018 hits to date: {frame_time}",
       caption = "Data source: baseballsavant.com\nBuilt with the baseballr package\n") + 
  transition_reveal(game_date, game_date) + 
  theme(
    legend.title = element_blank(),
    plot.title = element_text(size = 14),
    plot.caption = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 12, face = "bold")
  )


animate(joey_spray, width = 800, height = 800)

From this animation, Joey seems to hit most of his home runs to right field, but other than that he seems to spread his hits all around the field.

Let’s take a look at how Joey’s batting average has changed over time. We can use the daily statistics we calculated earlier to see how it changes over time using the transition_reveal function.

joey_rbi <- joey_bsvnt_daily %>% 
  ggplot(aes(game_date, ba_to_date)) + 
  geom_step(size = 1.5) + 
  theme_bw() + 
  labs(
    y = "Batting Average",
    title = "Joey Wendle's batting average during the 2018 season",
    caption = "Data source: baseballsavant.com\n"
  ) + 
  scale_x_date(date_breaks = "2 weeks", date_labels = "%B %d, %Y") + 
  scale_y_continuous(breaks = seq(0, 0.35, 0.05)) + 
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.caption = element_text(size = 10, face = "bold"),
    axis.title.x = element_blank(),
    axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, face = "bold"),
    axis.text.y = element_text(face = "bold")
  ) + 
  transition_reveal(id = game_date, along = game_date) 


animate(joey_rbi, width = 800, height = 600)

As we can see, Joey has stayed fairly consistent between 0.275 and 0.300 and has only been improving since the all-star break. Using the same functionality, we can look at multiple stats all at once by simply faceting the graphs. Here we can see how his batting average, slugging percentage and on-base percentage have changed over the course of the season.

multi_stats <- joey_bsvnt_daily %>% 
  select(game_date, ba_to_date, slg_to_date, obp_to_date) %>% 
  gather(key = "metric", value = "value", -game_date) %>% 
  mutate(metric = case_when(
    metric == "ba_to_date" ~ "Batting Average",
    metric == "slg_to_date" ~ "Slugging",
    metric == "obp_to_date" ~ "On-base Percentage"
  )) %>% 
  ggplot(aes(game_date, value)) + 
  geom_step(size = 1.5) + 
  facet_wrap(vars(metric), scales = "free_y", ncol = 1) + 
  theme_bw() + 
  scale_x_date(date_breaks = "2 week", date_labels = "%B %d, %Y") + 
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.caption = element_text(size = 10, face = "bold"),
    axis.title = element_blank(),
    axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, face = "bold"),
    axis.text.y = element_text(face = "bold")
  ) + 
  labs(
   title = "Joey Wendle's batting average, slugging, and on-base percentage, respectively",
   caption = "Data source: baseballsavant.com\n"
  ) + 
  transition_reveal(game_date, game_date)

animate(multi_stats, width = 800, height = 1000)

Conclusion

Joey is having a breakout year and the gganimate package gives us a cool way of viewing his progress over the course of the season. I hope you enjoyed the examples and I hope you root for Joey to finish off the year strong!

Please let me know what you think!

To leave a comment for the author, please follow the link and comment on their blog: Rstats on goonR blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)