Site icon R-bloggers

fitzRoy – 0.1.5 release

[This article was first published on Analysis of AFL, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For those of you who’ve been following me on Twitter, you’ll know that I’ve been working on an R package for AFL called fitzRoy with Rob from Analysis of AFL. Today we released a new version which has a much requested feature, so I’d figured a blog post was in order. You’ll have to reinstall fitzRoy to get the latest functions. We still aren’t on CRAN but you can use devtools to get it.

# install.packages("devtools") # uncomment if you haven't installed devtools before
library(tidyverse)
devtools::install_github("jimmyday12/fitzRoy")

AFL Tables player stats

Our initial version of fitzRoy had some data included in it from a data dump we got from Paul at AFLtables. This data was great as it had a all of the afltables stats on a player by player basis for all time. While this was ok for historical analysis, it stopped at round 3, 2017 and it was a one off dump meaning we couldn’t keep it up to date. As such, we’ve written a new function to replace this internal data. It’s called get_afltables_stats. It takes two arguments start_date and end_date. These are pretty self explanatory – the function will return stats from all matches between start_date and end_date. The format of these inputs needs to be either dmy or ymd. Both arguments are optional. start_date will default to the first AFL game end_date will default to the System Date. As an example, we could just grab data from this year.

library(fitzRoy)
## Warning: package 'fitzRoy' was built under R version 3.5.1
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.1
## -- Attaching packages ---------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.7
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0
## Warning: package 'ggplot2' was built under R version 3.5.1
## Warning: package 'tidyr' was built under R version 3.5.1
## Warning: package 'purrr' was built under R version 3.5.1
## Warning: package 'dplyr' was built under R version 3.5.1
## Warning: package 'stringr' was built under R version 3.5.1
## -- Conflicts ------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
dat <- get_afltables_stats("2018-01-01")
## Returning data from 2018-01-01 to 2018-11-24
## Downloading data
## 
## Finished downloading data. Processing XMLs
## Warning in rbind(names(probs), probs_f): number of columns of result is not
## a multiple of vector length (arg 1)
## Warning: 396 parsing failures.
## row # A tibble: 5 x 5 col     row col   expected   actual file                                       expected   <int> <chr> <chr>      <chr>  <chr>                                      actual 1  8713 Round an integer QF     'https://afltables.com/afl/stats/2018_sta~ file 2  8714 Round an integer QF     'https://afltables.com/afl/stats/2018_sta~ row 3  8715 Round an integer QF     'https://afltables.com/afl/stats/2018_sta~ col 4  8716 Round an integer QF     'https://afltables.com/afl/stats/2018_sta~ expected 5  8717 Round an integer QF     'https://afltables.com/afl/stats/2018_sta~
## ... ................. ... .......................................................................... ........ .......................................................................... ...... .......................................................................... .... .......................................................................... ... .......................................................................... ... .......................................................................... ........ ..........................................................................
## See problems(...) for more details.
## Warning: Unknown columns: `Substitute`
## Finished getting afltables data
tail(dat)
## # A tibble: 6 x 59
##   Season Round Date       Local.start.time Venue Attendance Home.team  HQ1G
##    <dbl> <chr> <date>                <int> <chr>      <int> <chr>     <int>
## 1   2018 Gran~ 2018-09-29             1430 "M.C~     100022 West Coa~     2
## 2   2018 Gran~ 2018-09-29             1430 "M.C~     100022 West Coa~     2
## 3   2018 Gran~ 2018-09-29             1430 "M.C~     100022 West Coa~     2
## 4   2018 Gran~ 2018-09-29             1430 "M.C~     100022 West Coa~     2
## 5   2018 Gran~ 2018-09-29             1430 "M.C~     100022 West Coa~     2
## 6   2018 Gran~ 2018-09-29             1430 "M.C~     100022 West Coa~     2
## # ... with 51 more variables: HQ1B <int>, HQ2G <int>, HQ2B <int>,
## #   HQ3G <int>, HQ3B <int>, HQ4G <int>, HQ4B <int>, Home.score <int>,
## #   Away.team <chr>, AQ1G <int>, AQ1B <int>, AQ2G <int>, AQ2B <int>,
## #   AQ3G <int>, AQ3B <int>, AQ4G <int>, AQ4B <int>, Away.score <int>,
## #   First.name <chr>, Surname <chr>, ID <int>, Jumper.No. <dbl>,
## #   Playing.for <chr>, Kicks <dbl>, Marks <dbl>, Handballs <dbl>,
## #   Goals <dbl>, Behinds <dbl>, Hit.Outs <dbl>, Tackles <dbl>,
## #   Rebounds <dbl>, Inside.50s <dbl>, Clearances <dbl>, Clangers <dbl>,
## #   Frees.For <dbl>, Frees.Against <dbl>, Brownlow.Votes <dbl>,
## #   Contested.Possessions <dbl>, Uncontested.Possessions <dbl>,
## #   Contested.Marks <dbl>, Marks.Inside.50 <dbl>, One.Percenters <dbl>,
## #   Bounces <dbl>, Goal.Assists <dbl>, Time.on.Ground.. <int>,
## #   Substitute <int>, Umpire.1 <chr>, Umpire.2 <chr>, Umpire.3 <chr>,
## #   Umpire.4 <chr>, group_id <int>

Note that each row is a ‘player match’ so the first few columns are just repeated team level data. It is probably more intersting to look at specific columns relating to player stats.

dat %>% 
  select(Date, First.name, Surname, Playing.for, Contested.Possessions, 
         Uncontested.Possessions, One.Percenters, Time.on.Ground.., 
         Brownlow.Votes)
## # A tibble: 9,108 x 9
##    Date       First.name Surname Playing.for Contested.Posse~
##    <date>     <chr>      <chr>   <chr>                  <dbl>
##  1 2018-03-22 David      Astbury Richmond                   9
##  2 2018-03-22 Shai       Bolton  Richmond                   3
##  3 2018-03-22 Dan        Butler  Richmond                   7
##  4 2018-03-22 Josh       Caddy   Richmond                  11
##  5 2018-03-22 Jason      Castag~ Richmond                   7
##  6 2018-03-22 Reece      Conca   Richmond                   6
##  7 2018-03-22 Trent      Cotchin Richmond                  13
##  8 2018-03-22 Shane      Edwards Richmond                   9
##  9 2018-03-22 Brandon    Ellis   Richmond                   3
## 10 2018-03-22 Corey      Ellis   Richmond                   7
## # ... with 9,098 more rows, and 4 more variables:
## #   Uncontested.Possessions <dbl>, One.Percenters <dbl>,
## #   Time.on.Ground.. <int>, Brownlow.Votes <dbl>

That’s about it. The the rest of the changes are just bug fixes which you can see in the NEWS page of the packages website. Hit us up on Twitter at plusSixOneBlog anoafl or over on Github if you have any feedback or issues! Enjoy.

To leave a comment for the author, please follow the link and comment on their blog: Analysis of AFL.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.