Best Fantasy Player of All Time

[This article was first published on Analysis of AFL, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One of the things I have always wondered about AFL fantasy is just who is the best fantasy player of all time? Not the fan who wins the most but who is the best player.

So one possible idea would be to work out the fantasy scores of players going back for all the time that is possible (YAY fitzRoy!). From there, lets look at the z-score of players in the individual year look at a plot for all years we have fantasy available and do some exploration.

Step One – Work out Fantasy Scores

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0       ✔ purrr   0.3.2  
## ✔ tibble  2.1.1       ✔ dplyr   0.8.0.1
## ✔ tidyr   0.8.3       ✔ stringr 1.4.0  
## ✔ readr   1.3.1       ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
df<-fitzRoy::get_afltables_stats(start_date="1987-01-01", end_date = "2018-10-10")%>%
  mutate(AF=3*Kicks + 2*Handballs + 3*Marks +
           4*Tackles+ Frees.For -
           3*Frees.Against +Hit.Outs+
           6*Goals +Behinds)
## Returning data from 1987-01-01 to 2018-10-10
## Downloading data
## 
## Finished downloading data. Processing XMLs
## Warning: Detecting old grouped_df format, replacing `vars` attribute by
## `groups`
## Finished getting afltables data

Once we have worked out the fantasy scores we need to come up with our filters and our groups with of course sensible checks and balances.

We know that tackles first started getting recorded in 1987, so lets filter our dataset by Season> 1986

df%>%
  
  filter(Season>1986)
## # A tibble: 254,507 x 60
##    Season Round Date       Local.start.time Venue Attendance Home.team
##     <dbl> <chr> <date>                <int> <chr>      <int> <chr>    
##  1   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  2   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  3   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  4   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  5   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  6   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  7   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  8   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  9   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
## 10   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
## # … with 254,497 more rows, and 53 more variables: HQ1G <int>, HQ1B <int>,
## #   HQ2G <int>, HQ2B <int>, HQ3G <int>, HQ3B <int>, HQ4G <int>,
## #   HQ4B <int>, Home.score <int>, Away.team <chr>, AQ1G <int>, AQ1B <int>,
## #   AQ2G <int>, AQ2B <int>, AQ3G <int>, AQ3B <int>, AQ4G <int>,
## #   AQ4B <int>, Away.score <int>, First.name <chr>, Surname <chr>,
## #   ID <dbl>, Jumper.No. <dbl>, Playing.for <chr>, Kicks <dbl>,
## #   Marks <dbl>, Handballs <dbl>, Goals <dbl>, Behinds <dbl>,
## #   Hit.Outs <dbl>, Tackles <dbl>, Rebounds <dbl>, Inside.50s <dbl>,
## #   Clearances <dbl>, Clangers <dbl>, Frees.For <dbl>,
## #   Frees.Against <dbl>, Brownlow.Votes <dbl>,
## #   Contested.Possessions <dbl>, Uncontested.Possessions <dbl>,
## #   Contested.Marks <dbl>, Marks.Inside.50 <dbl>, One.Percenters <dbl>,
## #   Bounces <dbl>, Goal.Assists <dbl>, Time.on.Ground.. <int>,
## #   Substitute <int>, Umpire.1 <chr>, Umpire.2 <chr>, Umpire.3 <chr>,
## #   Umpire.4 <chr>, group_id <int>, AF <dbl>

Our next stage, we are going to add a count of the games each player played in during the Season, we make this decision because we want to have a look at players who had the best season, not just the best games and 10 seems like a reasonable cut off

df%>%
  filter(Season>1986)%>%
  group_by(Season, ID)%>%
  mutate(countgames=n())
## # A tibble: 254,507 x 61
## # Groups:   Season, ID [18,678]
##    Season Round Date       Local.start.time Venue Attendance Home.team
##     <dbl> <chr> <date>                <int> <chr>      <int> <chr>    
##  1   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  2   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  3   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  4   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  5   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  6   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  7   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  8   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  9   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
## 10   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
## # … with 254,497 more rows, and 54 more variables: HQ1G <int>, HQ1B <int>,
## #   HQ2G <int>, HQ2B <int>, HQ3G <int>, HQ3B <int>, HQ4G <int>,
## #   HQ4B <int>, Home.score <int>, Away.team <chr>, AQ1G <int>, AQ1B <int>,
## #   AQ2G <int>, AQ2B <int>, AQ3G <int>, AQ3B <int>, AQ4G <int>,
## #   AQ4B <int>, Away.score <int>, First.name <chr>, Surname <chr>,
## #   ID <dbl>, Jumper.No. <dbl>, Playing.for <chr>, Kicks <dbl>,
## #   Marks <dbl>, Handballs <dbl>, Goals <dbl>, Behinds <dbl>,
## #   Hit.Outs <dbl>, Tackles <dbl>, Rebounds <dbl>, Inside.50s <dbl>,
## #   Clearances <dbl>, Clangers <dbl>, Frees.For <dbl>,
## #   Frees.Against <dbl>, Brownlow.Votes <dbl>,
## #   Contested.Possessions <dbl>, Uncontested.Possessions <dbl>,
## #   Contested.Marks <dbl>, Marks.Inside.50 <dbl>, One.Percenters <dbl>,
## #   Bounces <dbl>, Goal.Assists <dbl>, Time.on.Ground.. <int>,
## #   Substitute <int>, Umpire.1 <chr>, Umpire.2 <chr>, Umpire.3 <chr>,
## #   Umpire.4 <chr>, group_id <int>, AF <dbl>, countgames <int>

Our next stage is that we want to work out the z-scores of the fantasy players per game by by season

df%>%
  filter(Season>1986)%>%
  group_by(Season, ID)%>%
  mutate(countgames=n())%>%
  group_by(Season)%>%
  mutate(Z_score=(AF-mean(AF))/sd(AF))
## # A tibble: 254,507 x 62
## # Groups:   Season [32]
##    Season Round Date       Local.start.time Venue Attendance Home.team
##     <dbl> <chr> <date>                <int> <chr>      <int> <chr>    
##  1   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  2   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  3   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  4   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  5   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  6   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  7   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  8   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
##  9   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
## 10   1991 1     1991-03-22             1940 Foot…      44902 Adelaide 
## # … with 254,497 more rows, and 55 more variables: HQ1G <int>, HQ1B <int>,
## #   HQ2G <int>, HQ2B <int>, HQ3G <int>, HQ3B <int>, HQ4G <int>,
## #   HQ4B <int>, Home.score <int>, Away.team <chr>, AQ1G <int>, AQ1B <int>,
## #   AQ2G <int>, AQ2B <int>, AQ3G <int>, AQ3B <int>, AQ4G <int>,
## #   AQ4B <int>, Away.score <int>, First.name <chr>, Surname <chr>,
## #   ID <dbl>, Jumper.No. <dbl>, Playing.for <chr>, Kicks <dbl>,
## #   Marks <dbl>, Handballs <dbl>, Goals <dbl>, Behinds <dbl>,
## #   Hit.Outs <dbl>, Tackles <dbl>, Rebounds <dbl>, Inside.50s <dbl>,
## #   Clearances <dbl>, Clangers <dbl>, Frees.For <dbl>,
## #   Frees.Against <dbl>, Brownlow.Votes <dbl>,
## #   Contested.Possessions <dbl>, Uncontested.Possessions <dbl>,
## #   Contested.Marks <dbl>, Marks.Inside.50 <dbl>, One.Percenters <dbl>,
## #   Bounces <dbl>, Goal.Assists <dbl>, Time.on.Ground.. <int>,
## #   Substitute <int>, Umpire.1 <chr>, Umpire.2 <chr>, Umpire.3 <chr>,
## #   Umpire.4 <chr>, group_id <int>, AF <dbl>, countgames <int>,
## #   Z_score <dbl>

After finding the Z_score, we can do a check. What we want to see is that for each season our Z_score column has a mean of 0 and a variance of 1. We can also produce the summaries of the other variables by Season to check our other created columns like countgames make sense

df%>%
  filter(Season>1986)%>%
  group_by(Season, ID)%>%
  mutate(countgames=n())%>%
  group_by(Season)%>%
  mutate(Z_score=(AF-mean(AF))/sd(AF))%>%
  split(.$Season)%>%map(summary)

Then the last thing is to filter out the players who didn’t play more than 10 games in a season and to find each players meanZ_score by year.

df%>%
  filter(Season>1986 & Time.on.Ground..> 60)%>%
  group_by(Season, ID)%>%
  mutate(countgames=n())%>%
  group_by(Season)%>%
  mutate(Z_score=(AF-mean(AF))/sd(AF))%>%
  group_by(Season, ID, First.name, Surname)%>%
filter(countgames>10)%>%
  summarise(meanZ_score=mean(Z_score))%>%
  arrange(desc(meanZ_score))
## # A tibble: 5,792 x 5
## # Groups:   Season, ID, First.name [5,792]
##    Season    ID First.name Surname  meanZ_score
##     <dbl> <dbl> <chr>      <chr>          <dbl>
##  1   2014 11787 Tom        Rockliff        2.32
##  2   2012  1460 Dane       Swan            2.25
##  3   2018 12196 Tom        Mitchell        2.14
##  4   2016 11787 Tom        Rockliff        2.13
##  5   2018 12166 Jack       Macrae          2.10
##  6   2017 12196 Tom        Mitchell        2.02
##  7   2012  1105 Gary       Ablett          2.01
##  8   2010  1460 Dane       Swan            1.99
##  9   2014  1105 Gary       Ablett          1.87
## 10   2018 12217 Brodie     Grundy          1.79
## # … with 5,782 more rows

But this would probably looks best as a graph as noted earlier.

df%>%
  filter(Season>1986)%>%
  group_by(Season, ID)%>%
  mutate(countgames=n())%>%
  group_by(Season)%>%
  mutate(Z_score=(AF-mean(AF))/sd(AF))%>%
  group_by(Season, ID, First.name, Surname)%>%
filter(countgames>10)%>%
  summarise(meanZ_score=mean(Z_score))%>%
  ggplot(aes(x=meanZ_score, y=Season))+geom_point()

Then you probably would want to label our top 10 list so lets do that.

df%>%
  filter(Season>1986)%>%
  group_by(Season, ID)%>%
  mutate(countgames=n())%>%
  group_by(Season)%>%
  mutate(Z_score=(AF-mean(AF))/sd(AF))%>%
  group_by(Season, ID, First.name, Surname)%>%
filter(countgames>10)%>%
  summarise(meanZ_score=mean(Z_score))%>%
  ggplot(aes(x=meanZ_score, y=Season))+
  geom_point()+
  geom_text(aes(label=ifelse(meanZ_score>1.9, as.character(Surname),"")), vjust=2, size=2, colour="blue")+
  ggtitle("AFL Fantasy Z-Score by Season")

So what we can see here is that Nathan Buckley in his day, had some of the best fantasy seasons of AFL that we have statistics for. In fact, he has 3 out of the top 10 all time fantasy seasons.

It’s just a shame that fantasy wasn’t around back when he was around.

What about top 10 finishes?

df%>%
  filter(Season>1986)%>%
  group_by(Season, ID)%>%
  mutate(countgames=n())%>%
  group_by(Season)%>%
  mutate(Z_score=(AF-mean(AF))/sd(AF))%>%
  group_by(Season, ID, First.name, Surname)%>%
filter(countgames>10 & AF>30)%>%
  summarise(meanZ_score=mean(Z_score))%>%
   group_by(Season) %>%
  arrange(desc(meanZ_score))%>% 
  mutate(id = row_number())%>%
  filter(id<11)%>%
  group_by(ID, First.name, Surname)%>%summarise(top10=n())%>%
  arrange(desc(top10))
## # A tibble: 150 x 4
## # Groups:   ID, First.name [150]
##       ID First.name Surname    top10
##    <dbl> <chr>      <chr>      <int>
##  1   217 Nathan     Buckley       11
##  2  1105 Gary       Ablett         9
##  3  1460 Dane       Swan           9
##  4   766 Wayne      Carey          7
##  5   990 Tony       Lockett        7
##  6   157 Barry      Mitchell       6
##  7   927 Stewart    Loewe          6
##  8  4182 Scott      Pendlebury     6
##  9   142 Greg       Williams       5
## 10   312 James      Hird           5
## # … with 140 more rows

Yes there you have it, not only does he have some of the best seasons of fantasy relative to the other players in his season group. But he also has the most number of top10 fantasy finishes in “possible” fantasy history( if we started when tackles first started getting recorded).

To leave a comment for the author, please follow the link and comment on their blog: Analysis of AFL.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)