Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
For those of you who’ve been following me on Twitter, you’ll know that I’ve been working on an R package for AFL called fitzRoy with Rob from Analysis of AFL. Today we released a new version which has a much requested feature, so I’d figured a blog post was in order. You’ll have to reinstall fitzRoy to get the latest functions. We still aren’t on CRAN but you can use devtools to get it.
# install.packages("devtools") # uncomment if you haven't installed devtools before library(tidyverse) devtools::install_github("jimmyday12/fitzRoy")
AFL Tables player stats
Our initial version of fitzRoy had some data included in it from a data dump we got from Paul at AFLtables. This data was great as it had a all of the afltables stats on a player by player basis for all time. While this was ok for historical analysis, it stopped at round 3, 2017 and it was a one off dump meaning we couldn’t keep it up to date. As such, we’ve written a new function to replace this internal data. It’s called get_afltables_stats
. It takes two arguments start_date
and end_date
. These are pretty self explanatory – the function will return stats from all matches between start_date
and end_date
. The format of these inputs needs to be either dmy or ymd. Both arguments are optional. start_date
will default to the first AFL game end_date
will default to the System Date. As an example, we could just grab data from this year.
library(fitzRoy) ## Warning: package 'fitzRoy' was built under R version 3.5.1 library(tidyverse) ## Warning: package 'tidyverse' was built under R version 3.5.1 ## -- Attaching packages ---------------------------------- tidyverse 1.2.1 -- ## v ggplot2 3.1.0 v purrr 0.2.5 ## v tibble 1.4.2 v dplyr 0.7.7 ## v tidyr 0.8.1 v stringr 1.3.1 ## v readr 1.1.1 v forcats 0.3.0 ## Warning: package 'ggplot2' was built under R version 3.5.1 ## Warning: package 'tidyr' was built under R version 3.5.1 ## Warning: package 'purrr' was built under R version 3.5.1 ## Warning: package 'dplyr' was built under R version 3.5.1 ## Warning: package 'stringr' was built under R version 3.5.1 ## -- Conflicts ------------------------------------- tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() dat <- get_afltables_stats("2018-01-01") ## Returning data from 2018-01-01 to 2018-11-24 ## Downloading data ## ## Finished downloading data. Processing XMLs ## Warning in rbind(names(probs), probs_f): number of columns of result is not ## a multiple of vector length (arg 1) ## Warning: 396 parsing failures. ## row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 8713 Round an integer QF 'https://afltables.com/afl/stats/2018_sta~ file 2 8714 Round an integer QF 'https://afltables.com/afl/stats/2018_sta~ row 3 8715 Round an integer QF 'https://afltables.com/afl/stats/2018_sta~ col 4 8716 Round an integer QF 'https://afltables.com/afl/stats/2018_sta~ expected 5 8717 Round an integer QF 'https://afltables.com/afl/stats/2018_sta~ ## ... ................. ... .......................................................................... ........ .......................................................................... ...... .......................................................................... .... .......................................................................... ... .......................................................................... ... .......................................................................... ........ .......................................................................... ## See problems(...) for more details. ## Warning: Unknown columns: `Substitute` ## Finished getting afltables data tail(dat) ## # A tibble: 6 x 59 ## Season Round Date Local.start.time Venue Attendance Home.team HQ1G ## <dbl> <chr> <date> <int> <chr> <int> <chr> <int> ## 1 2018 Gran~ 2018-09-29 1430 "M.C~ 100022 West Coa~ 2 ## 2 2018 Gran~ 2018-09-29 1430 "M.C~ 100022 West Coa~ 2 ## 3 2018 Gran~ 2018-09-29 1430 "M.C~ 100022 West Coa~ 2 ## 4 2018 Gran~ 2018-09-29 1430 "M.C~ 100022 West Coa~ 2 ## 5 2018 Gran~ 2018-09-29 1430 "M.C~ 100022 West Coa~ 2 ## 6 2018 Gran~ 2018-09-29 1430 "M.C~ 100022 West Coa~ 2 ## # ... with 51 more variables: HQ1B <int>, HQ2G <int>, HQ2B <int>, ## # HQ3G <int>, HQ3B <int>, HQ4G <int>, HQ4B <int>, Home.score <int>, ## # Away.team <chr>, AQ1G <int>, AQ1B <int>, AQ2G <int>, AQ2B <int>, ## # AQ3G <int>, AQ3B <int>, AQ4G <int>, AQ4B <int>, Away.score <int>, ## # First.name <chr>, Surname <chr>, ID <int>, Jumper.No. <dbl>, ## # Playing.for <chr>, Kicks <dbl>, Marks <dbl>, Handballs <dbl>, ## # Goals <dbl>, Behinds <dbl>, Hit.Outs <dbl>, Tackles <dbl>, ## # Rebounds <dbl>, Inside.50s <dbl>, Clearances <dbl>, Clangers <dbl>, ## # Frees.For <dbl>, Frees.Against <dbl>, Brownlow.Votes <dbl>, ## # Contested.Possessions <dbl>, Uncontested.Possessions <dbl>, ## # Contested.Marks <dbl>, Marks.Inside.50 <dbl>, One.Percenters <dbl>, ## # Bounces <dbl>, Goal.Assists <dbl>, Time.on.Ground.. <int>, ## # Substitute <int>, Umpire.1 <chr>, Umpire.2 <chr>, Umpire.3 <chr>, ## # Umpire.4 <chr>, group_id <int>
Note that each row is a ‘player match’ so the first few columns are just repeated team level data. It is probably more intersting to look at specific columns relating to player stats.
dat %>% select(Date, First.name, Surname, Playing.for, Contested.Possessions, Uncontested.Possessions, One.Percenters, Time.on.Ground.., Brownlow.Votes) ## # A tibble: 9,108 x 9 ## Date First.name Surname Playing.for Contested.Posse~ ## <date> <chr> <chr> <chr> <dbl> ## 1 2018-03-22 David Astbury Richmond 9 ## 2 2018-03-22 Shai Bolton Richmond 3 ## 3 2018-03-22 Dan Butler Richmond 7 ## 4 2018-03-22 Josh Caddy Richmond 11 ## 5 2018-03-22 Jason Castag~ Richmond 7 ## 6 2018-03-22 Reece Conca Richmond 6 ## 7 2018-03-22 Trent Cotchin Richmond 13 ## 8 2018-03-22 Shane Edwards Richmond 9 ## 9 2018-03-22 Brandon Ellis Richmond 3 ## 10 2018-03-22 Corey Ellis Richmond 7 ## # ... with 9,098 more rows, and 4 more variables: ## # Uncontested.Possessions <dbl>, One.Percenters <dbl>, ## # Time.on.Ground.. <int>, Brownlow.Votes <dbl>
That’s about it. The the rest of the changes are just bug fixes which you can see in the NEWS page of the packages website. Hit us up on Twitter at plusSixOneBlog anoafl or over on Github if you have any feedback or issues! Enjoy.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.