Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Selby Lee Steere has won AFL fantasy two years in a row. He is the coach of Moreiras Magic and through selling his fantasy guide donates a lot of money to the Starlight foundation. I was lucky enough to do an interview with him talking about his strategies in winning fantasy.
AFL fantasy scores as opposed to supercoach scores are able to be fully derived from the box-score statistics. Why this is good it means if we wanted to, and I guess we do for the purposes of this post lets do a few things
- Lets get the fantasy scores for all players as far back as we can go
- Now that we have fantasy scores going back years and years, lets compare players. In the podcast, it was pointed out that maybe Brayshaw will see an increase in useage because Neale has left, are their stats similar in their first years? Can they be similar in their second.
- Maybe like Kane Cornes you believe that Powell-Pepper can be the next Dusty Martin, so lets compare the pair.
- Another one of Selby fantastic tips was to think about points per minute, we can actually calculate this too!
Step one Get the data!
library(tidyverse) ## ── Attaching packages ───────────────────────────────────────────────────────── tidyverse 1.2.1 ── ## ✔ ggplot2 3.1.0 ✔ purrr 0.3.0 ## ✔ tibble 2.0.1 ✔ dplyr 0.8.0.1 ## ✔ tidyr 0.8.3 ✔ stringr 1.4.0 ## ✔ readr 1.3.1 ✔ forcats 0.4.0 ## ── Conflicts ──────────────────────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() dataframe<-fitzRoy::get_afltables_stats(start_date = "1897-01-01",end_date = "2018-10-10") ## Returning data from 1897-01-01 to 2018-10-10 ## Downloading data ## ## Finished downloading data. Processing XMLs ## Warning: Detecting old grouped_df format, replacing `vars` attribute by ## `groups` ## Finished getting afltables data
Step two data checks
One of the things about AFL data is that statistics are being collected from different points in time, no one was recording the meters gained back in the 1960s.
A quick way to check when a statistic is first being collected would be to do a search or hopefully there is a data dictionary.
But in this case, lets use R and the tidyverse to do it.
library(tidyverse) dataframe%>% group_by(Season)%>% summarise(meantime=mean(Time.on.Ground..))%>% filter(meantime>0) ## # A tibble: 16 x 2 ## Season meantime ## <dbl> <dbl> ## 1 2003 81.8 ## 2 2004 81.8 ## 3 2005 81.8 ## 4 2006 81.8 ## 5 2007 81.8 ## 6 2008 81.8 ## 7 2009 81.8 ## 8 2010 81.8 ## 9 2011 81.8 ## 10 2012 81.8 ## 11 2013 81.8 ## 12 2014 81.8 ## 13 2015 81.8 ## 14 2016 81.8 ## 15 2017 81.8 ## 16 2018 81.8
So what we can see from this, is that time on ground was first being collected in 2003. So if we were to work out points per minute we could only start in 2003.
Come up with a column of fantasy scores
We know that AFL fantasy scores are worked out as follows:
df<-data.frame(stringsAsFactors=FALSE, Match.Stat = c("Kick", "Handball", "Mark", "Tackle", "Free Kick For", "Free Kick Against", "Hitout", "Goal", "Behind"), Fantasy.Points = c("3 Points", "2 Points", "3 Points", "4 Points", "1 Point", "-3 Points", "1 Point", "6 Points", "1 Point") ) df ## Match.Stat Fantasy.Points ## 1 Kick 3 Points ## 2 Handball 2 Points ## 3 Mark 3 Points ## 4 Tackle 4 Points ## 5 Free Kick For 1 Point ## 6 Free Kick Against -3 Points ## 7 Hitout 1 Point ## 8 Goal 6 Points ## 9 Behind 1 Point
Now all we have to do is create a new column using the above as a formula so we can work out the fantasy scores going backwards, we can use mutate to do this.
dataframe%>% mutate(fantasy_score= 3*Kicks +2*Handballs+ 3*Marks +4*Tackles+Frees.For - 3*Frees.Against+Hit.Outs+6*Goals+Behinds)%>% group_by(Season)%>% summarise(meanfantasy=mean(fantasy_score))%>% ggplot(aes(x=Season, y=meanfantasy))+geom_line()
To find out when the statistics are first being collected we might want to do this graphically. What I am thinking what if we just had a heap of lines with each line representing one of the statistics. When it comes to plotting multiple lines on a graph its a good idea to practice the use of gather from the tidyverse
dataframe%>% mutate(fantasy_score= 3*Kicks +2*Handballs+ 3*Marks +4*Tackles+Frees.For - 3*Frees.Against+Hit.Outs+6*Goals+Behinds)%>% group_by(Season)%>% summarise(meanfantasy=mean(fantasy_score), meankicks=mean(Kicks), meanhandballs=mean(Handballs), meanmarks=mean(Marks), meantackles=mean(Tackles), meanfreesfor=mean(Frees.For), meansfreesagainst=mean(Frees.Against), meanhitouts=mean(Hit.Outs), meangoals=mean(Goals), meanbehinds=mean(Behinds))%>% gather("variable", "value",-Season) %>% ggplot(aes(x=Season, y=value, group=variable, colour=variable))+geom_line()
Looking at this graph, it would seem as though if we used 2000 as a cut off, that would be fine as all our data is being collected then. What we don’t want there to be an example of, is a statistic like meters gained that was only started to be tracked in 2007.
Next what we want to do now that we have the fantasy scores and a rough timeline of when it makes sense that all our data is being collected lets see the average fantasy scores by player and year.
dataframe%>% mutate(fantasy_score= 3*Kicks +2*Handballs+ 3*Marks +4*Tackles+Frees.For - 3*Frees.Against+Hit.Outs+6*Goals+Behinds)%>% group_by(Season, First.name, Surname, ID)%>% summarise(meanFS=mean(fantasy_score))%>% filter(Season>2000) ## # A tibble: 10,933 x 5 ## # Groups: Season, First.name, Surname [10,879] ## Season First.name Surname ID meanFS ## <dbl> <chr> <chr> <dbl> <dbl> ## 1 2001 Aaron Fiora 911 34.6 ## 2 2001 Aaron Hamill 171 71.4 ## 3 2001 Aaron Henneman 362 32.9 ## 4 2001 Aaron Lord 574 48.7 ## 5 2001 Aaron Shattock 1093 31.3 ## 6 2001 Adam Contessa 457 49.9 ## 7 2001 Adam Goodes 1012 71.5 ## 8 2001 Adam Houlihan 593 54.7 ## 9 2001 Adam Hunter 1072 23.3 ## 10 2001 Adam Kingsley 825 66.9 ## # … with 10,923 more rows
But! We know that with AFL fantasy we don’t want to include finals that’s because finals aren’t included in the competition so lets exclude finals.
dataframe%>% mutate(fantasy_score= 3*Kicks +2*Handballs+ 3*Marks +4*Tackles+Frees.For - 3*Frees.Against+Hit.Outs+6*Goals+Behinds)%>% filter(!(Round %in% c("EF", "PF", "SF","GF","QF")))%>% group_by(Season, First.name, Surname, ID)%>% summarise(meanFS=mean(fantasy_score))%>% filter(Season>2000)%>%arrange(desc(meanFS)) ## # A tibble: 10,926 x 5 ## # Groups: Season, First.name, Surname [10,872] ## Season First.name Surname ID meanFS ## <dbl> <chr> <chr> <dbl> <dbl> ## 1 2014 Tom Rockliff 11787 135. ## 2 2012 Dane Swan 1460 134. ## 3 2018 Tom Mitchell 12196 129. ## 4 2017 Tom Mitchell 12196 127. ## 5 2012 Gary Ablett 1105 125. ## 6 2010 Dane Swan 1460 123. ## 7 2018 Jack Macrae 12166 123. ## 8 2011 Dane Swan 1460 121. ## 9 2017 Patrick Dangerfield 11700 121. ## 10 2018 Brodie Grundy 12217 120 ## # … with 10,916 more rows
So what did we originally want to do, we wanted to see how Neale did in his second year to get a sense of his improvement and is that comparable to Brayshaw?
To do this we want to find the player IDS, this makes it easier to filter out the players, secondly we want to add the count of games played in season. I think the second point is important because we don’t want to be misled by a small number of games played.
dataframe%>% mutate(fantasy_score= 3*Kicks +2*Handballs+ 3*Marks +4*Tackles+Frees.For - 3*Frees.Against+Hit.Outs+6*Goals+Behinds)%>% filter(!(Round %in% c("EF", "PF", "SF","GF","QF")))%>% group_by(Season, First.name, Surname, ID)%>% summarise(meanFS=mean(fantasy_score), count=n())%>% filter(Season>2000)%>%arrange(desc(Season))%>% filter(ID %in%c("12055","12589")) ## # A tibble: 8 x 6 ## # Groups: Season, First.name, Surname [8] ## Season First.name Surname ID meanFS count ## <dbl> <chr> <chr> <dbl> <dbl> <int> ## 1 2018 Andrew Brayshaw 12589 66.8 17 ## 2 2018 Lachie Neale 12055 100. 22 ## 3 2017 Lachie Neale 12055 100. 21 ## 4 2016 Lachie Neale 12055 111. 22 ## 5 2015 Lachie Neale 12055 102. 22 ## 6 2014 Lachie Neale 12055 82.3 21 ## 7 2013 Lachie Neale 12055 77.1 9 ## 8 2012 Lachie Neale 12055 42.1 11
So looking at the Neale vs Brayshaw comparision, Neale actually averaged less fantasy points in his first AFL season, but he picked it up to 77.1 in his second year and a slight imporvement to 82.3 in his third.
But as Selby pointed out, he’s looking at points per minute as his metric. So lets add that in.
dataframe%>% mutate(fantasy_score= 3*Kicks +2*Handballs+ 3*Marks +4*Tackles+Frees.For - 3*Frees.Against+Hit.Outs+6*Goals+Behinds)%>% filter(!(Round %in% c("EF", "PF", "SF","GF","QF")))%>% group_by(Season, First.name, Surname, ID)%>% summarise(meanFS=mean(fantasy_score), meantimeonground=mean(Time.on.Ground..), count=n())%>% filter(Season>2000)%>%arrange(desc(Season))%>% filter(ID %in%c("12055","12589")) ## # A tibble: 8 x 7 ## # Groups: Season, First.name, Surname [8] ## Season First.name Surname ID meanFS meantimeonground count ## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <int> ## 1 2018 Andrew Brayshaw 12589 66.8 66.7 17 ## 2 2018 Lachie Neale 12055 100. 80.2 22 ## 3 2017 Lachie Neale 12055 100. 77.6 21 ## 4 2016 Lachie Neale 12055 111. 81.6 22 ## 5 2015 Lachie Neale 12055 102. 77.0 22 ## 6 2014 Lachie Neale 12055 82.3 68.8 21 ## 7 2013 Lachie Neale 12055 77.1 73.9 9 ## 8 2012 Lachie Neale 12055 42.1 59.2 11
What we can see here, is that Brayshaw average time on ground was 66.7, Neale when he was getting just a little bit more game time (68.8 and 73.9) was getting an extra 10+ fantasy points on average. But what we can see is that his mean fantasy score and his mean time on ground is virtually one to one.
Lets see Brayshaw on a scatter plot for his debut season
dataframe%>% mutate(fantasy_score= 3*Kicks +2*Handballs+ 3*Marks +4*Tackles+Frees.For - 3*Frees.Against+Hit.Outs+6*Goals+Behinds)%>% filter(!(Round %in% c("EF", "PF", "SF","GF","QF")))%>% filter(ID =="12589")%>% ggplot(aes(x=Time.on.Ground..,y=fantasy_score))+geom_point()+geom_abline(intercept = 0)+ylim(0,100)+xlim(0,100)
Lets now look at Sam Powell Pepper vs Dustin Martin comparisons.
First lets find their player Ids
dataframe%>% mutate(fantasy_score= 3*Kicks +2*Handballs+ 3*Marks +4*Tackles+Frees.For - 3*Frees.Against+Hit.Outs+6*Goals+Behinds)%>% filter(!(Round %in% c("EF", "PF", "SF","GF","QF")))%>% filter(Surname =="Powell-Pepper")%>%select(ID) ## # A tibble: 37 x 1 ## ID ## <dbl> ## 1 12494 ## 2 12494 ## 3 12494 ## 4 12494 ## 5 12494 ## 6 12494 ## 7 12494 ## 8 12494 ## 9 12494 ## 10 12494 ## # … with 27 more rows dataframe%>% mutate(fantasy_score= 3*Kicks +2*Handballs+ 3*Marks +4*Tackles+Frees.For - 3*Frees.Against+Hit.Outs+6*Goals+Behinds)%>% filter(!(Round %in% c("EF", "PF", "SF","GF","QF")))%>% filter(Surname =="Martin", First.name=="Dustin")%>%select(ID) ## # A tibble: 193 x 1 ## ID ## <dbl> ## 1 11794 ## 2 11794 ## 3 11794 ## 4 11794 ## 5 11794 ## 6 11794 ## 7 11794 ## 8 11794 ## 9 11794 ## 10 11794 ## # … with 183 more rows
Now lets do the same comparisions as we were doing for Brayshaw and Neale
dataframe%>% mutate(fantasy_score= 3*Kicks +2*Handballs+ 3*Marks +4*Tackles+Frees.For - 3*Frees.Against+Hit.Outs+6*Goals+Behinds)%>% filter(!(Round %in% c("EF", "PF", "SF","GF","QF")))%>% group_by(Season, First.name, Surname, ID)%>% summarise(meanFS=mean(fantasy_score), meantimeonground=mean(Time.on.Ground..), count=n())%>% filter(Season>2000)%>%arrange(desc(Season))%>% filter(ID %in%c("11794","12494")) ## # A tibble: 11 x 7 ## # Groups: Season, First.name, Surname [11] ## Season First.name Surname ID meanFS meantimeonground count ## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <int> ## 1 2018 Dustin Martin 11794 92.4 82.9 21 ## 2 2018 Sam Powell-Pepper 12494 75 74.6 16 ## 3 2017 Dustin Martin 11794 114. 85 22 ## 4 2017 Sam Powell-Pepper 12494 69.9 68 21 ## 5 2016 Dustin Martin 11794 107. 83.1 22 ## 6 2015 Dustin Martin 11794 104. 79.8 22 ## 7 2014 Dustin Martin 11794 97.6 82.9 21 ## 8 2013 Dustin Martin 11794 97.0 81.8 22 ## 9 2012 Dustin Martin 11794 84.8 80.4 20 ## 10 2011 Dustin Martin 11794 89.5 84.4 22 ## 11 2010 Dustin Martin 11794 71.5 78.2 21
Lets now look at a similar scatter plot.
dataframe%>% mutate(fantasy_score= 3*Kicks +2*Handballs+ 3*Marks +4*Tackles+Frees.For - 3*Frees.Against+Hit.Outs+6*Goals+Behinds)%>% filter(!(Round %in% c("EF", "PF", "SF","GF","QF")))%>% group_by(Season, First.name, Surname, ID)%>% summarise(meanFS=mean(fantasy_score), meantimeonground=mean(Time.on.Ground..), count=n())%>% filter(Season>2000)%>%arrange(desc(Season))%>% filter(ID %in%c("11794","12494"))%>% ggplot(aes(x=meantimeonground, y=meanFS))+geom_point(aes(colour=as.factor(ID)))+geom_abline(intercept = 0)+ylim(0,140)+xlim(0,140)
So hopefully now what you have is a rough framework in place so you can easily filter out players you think are similar, track their careers and use 2 time afl fantasy winners tips by looking at time on ground as well!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.