Better predictions for AFL from adjusted Elo ratings by @ellis2013nz
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Warning – this post discusses gambling odds and even describes me placing small $5 bets, which I can easily afford to lose. In no way should this be interpreted as advice to anyone else to do the same, and I accept absolutely no liability for anyone who treats this blog post as a basis for gambling advice. If you find yourself losing money at gambling or suspect you or someone close to you has a gambling problem, please seek help from https://www.gamblinghelponline.org.au/ or other services.
Appreciating home team advantage
Last week I blogged about using Elo ratings to predict the winners in the Australian Football League (AFL) to help me blend in with the locals in my Melbourne workplace footy-tipping competition. Several people pointed out that my approach ignored the home team advantage, which is known to be a significant factor in the AFL. How significant? Well, here’s the proportion of games won by the home team in each season since the AFL’s beginning:
Overall, 59% of games in the AFL history have been won by the home team, although that proportion varies materially from season to season. Another way of putting this – if my AFL prediction system was as simple as “always pick the home team” I would expect (overall) to score a respectable 59% of successful picks.
My first inclination in response to this was just to add 0.09 to the chance of the home team winning from my model’s base probability, but first I thought I should check whether this adjustment varies by team. Well, it does, somewhat dramatically. This next chart, based on just modern era games (1997 and onwards), defines the home advantage as the proportion of home games won minus the proportion of away games won, divided by 2. The all-teams all-seasons average for this figure would be 0.09.
The teams that are conspicuously high on this measure are all non-Melbourne teams:
- Geelong
- Adelaide
- West Coast
- Fremantle
- Greater Western Sydney
Gold Coast (another non-Melbourne team, for any non-Australians reading) would show up as material if a success ratio measure were used instead of my simple (and crude) additive indicator; because their overall win rate is so low. Sydney and perhaps Brisbane stand out as good performers overall that don’t have as marked a home town advantage (or away-town disadvantage) as their non-Melbourne peers.
I presume this issue is well known to AFL afficianados. With the majority (or at least plurality – I haven’t counted) of games played in Melbourne, Melbourne-based clubs generally play many of their “away” matches still relatively close to players’ homes. Whereas (for example) the West Coast Eagles flying across the Nullarbor for a match is bound to cost their players, compared to the alternative situation of being at home and making the opponents fly west instead.
Geelong surprised me – Geelong is much closer to Melbourne than the inter-state teams, so no long flights are involved. But simply travelling even an hour up the M1, added to the enthusiastic partisanship of Geelong-Melbourne fan rivalry, perhaps explains the strong advantage.
Here’s the R code that performs these steps:
- load in functionality for the analysis
- download the AFL results from 1897 onwards from afltables using the
fitzRoy
package and store in the objectr
(for “results”) - draw the chart of home win rates per season
- estimate and plot teams’ recent-decades home advantage
- create a new
r2
object with home and away advantage and disadvantage adjustments for probabilities
Post continues after code extract
Choosing a better set of parameters
Seeing as I had to revisit my predictive method to adjust for home and away advantage/disadvantage, I decided to also take a more systematic approach to some parameters that I had set either arbitrarily or without noticing they were parameters in last week’s post. These fit into two areas:
- The FIBS-based Elo rating method I use requires a “match length” parameter. Longer matches mean the better player is more likely to win, and also lead to larger adjustments in Elo ratings once the result is known. I had used the winning margin as a proxy/equivalent for match length, reasoning that a team that won by a large amount had shown their superiority in a similar way to a backgammon player winning a longer match. But should a 30 point margin count as match length of 30 (ie equivalent to points) or 5 (equivalent to goals) or something in-between? And what margin or “match length” should I use for predicting future games, where even a margin of 1 is enough to win? And even more philosophically, is margin really related to the backgammon concept of match length in a linear way, or should there be discounting of larger margins? Today, I created three parameters to scale the margin, to choose a margin for predicting future matches, and put the scaled margin to the power of a number from zero to one for non-linearity.
- Elo ratings are by the nature sticky and based on a player/team’s whole career. I had noted that I got better performance in predicting the 2018 season by restarting all teams’ ratings at 1500 at the beginning of the 2017 season. But this is fairly crude. The FIBS Elo rating method has a parameter for teams’ past experience which in effect lets you control how responsive the rating is to new information (the idea being to help new players in a competition quickly move from 1500 up or down to their natural level). I have now added to this with a “new round factor” which shrinks ratings for the first game of the season towards 1500, effectively discounting past experience
Here’s the code that describes and defines those parameters a bit better, in an enhanced version of my afl_elos()
function I introduced last week.
Post continues after code extract
This function is now pretty slow when run on the whole AFL history form 1897. Unlike last week, it calls the underlying frs::elo_rating()
function for each game twice – once (the object er
above) to determine the ratings after the match outcome is known, and once to determine the prediction of the match’s result, for benchmarking purposes (the object er2
). Last week I didn’t need to use elo_rating()
twice, because the prediction of the winner was as simple as choosing the team with the highest Elo rating going in to the match. Now, we have to calculate the actual probability of winning, adjusted for home and away advantage and disadvantage. This calculation depends on the parameter choices that impact on converting margin to “match length” and what winning margin we base our predictions on, so the calculation is an additional one to the change in rating that came about from the actual result of the game.
There are doubtless efficiencies that could be made, but I’m not enthused to spend too much time refactoring at this point…
I have no theories and hardly any hunches on what parameter combinations will give the best performance, so the only way to choose a set is to try many and pick the one that would have worked best in predicting AFL games to date. I defined about 2,500 combinations of parameters, removed some that were effective duplicates (because if margin_power
is 0, then the value of sc
and pred_margin
are immaterial) and for the purposes of this blog ran just 100 random prediction competitions, based on the games from 1950 onwards. With each individual run taking five minutes or more, I used parallel processing to do 7 runs simultaneously and get the total time for those 100 runs down to about 90 minutes, all I was prepared to do today for the purposes of this blog. I might run it for a longer period of time overnight later.
The top twenty parameter sets from this competition are in the table below. The best combination of factors led to an overall prediction success of about 69%, which is better than last week’s 65%, the crude “always pick the home team” success of 59% and a coin flip of 50%; but not as much better as I hoped. Clearly picking these winners is hard – AFL is more like backgammon or poker in terms of predicting outcomes than it is like chess.
sc | pred_margin | margin_power | experience | new_round_factor | success_rate |
---|---|---|---|---|---|
1 | 30 | 0.8333333 | 100 | 0.4 | 0.6853485 |
1 | 20 | 0.8333333 | 200 | 0.8 | 0.6839260 |
1 | 20 | 1.0000000 | 100 | 0.6 | 0.6828829 |
1 | 10 | 0.8333333 | 0 | 0.4 | 0.6822191 |
1 | 20 | 0.6666667 | 100 | 0.8 | 0.6812707 |
3 | 20 | 0.8333333 | 0 | 0.8 | 0.6806069 |
1 | 20 | 1.0000000 | 400 | 0.8 | 0.6785206 |
3 | 10 | 1.0000000 | 0 | 0.8 | 0.6777620 |
1 | 20 | 0.5000000 | 0 | 0.8 | 0.6776671 |
1 | 10 | 1.0000000 | 300 | 0.8 | 0.6774775 |
3 | 30 | 0.8333333 | 200 | 0.6 | 0.6762447 |
1 | 10 | 1.0000000 | 300 | 0.6 | 0.6747274 |
1 | 20 | 0.8333333 | 400 | 0.6 | 0.6676150 |
3 | 10 | 0.6666667 | 100 | 0.8 | 0.6668563 |
6 | 20 | 1.0000000 | 100 | 1.0 | 0.6661925 |
1 | 30 | 0.5000000 | 100 | 1.0 | 0.6634424 |
1 | 10 | 0.5000000 | 0 | 0.4 | 0.6629682 |
3 | 20 | 0.6666667 | 100 | 1.0 | 0.6617354 |
6 | 20 | 0.8333333 | 100 | 0.6 | 0.6592698 |
6 | 30 | 0.6666667 | 200 | 0.8 | 0.6570887 |
The best models had a modest shrinkage of ratings towards 1500 (new_round_factor
of 0.4 to 0.8, compared to 0 which would mean everyone starting at 1500 in each round 1); and modest if any non-linearity in the conversion of winning margin to a notional “match length”. They had relatively low levels of “experience”, effectively increasing the importance of recent results and downplaying long term momentum; while treating match results in points (sc = 1
) and predicting based on a relatively large margin.
I only had time to try a random sample of parameter combinations, and would be very lucky indeed if I have ended up with the best set. How confident can I be that I’ve got something close enough? Here’s the distribution of success rates for that post 1950 series:
Without over-thinking it, it’s reasonable to infer a few more extreme values on the right are possible if we looked at the full set of parameters; but that they wouldn’t be that much more successful. It’s certainly good enough for a workplace footy tipping competition.
Here’s the predictive success of the best model over time, now applied to the full range of data not just the post 1950 period for which it was optimised:
… and the code that did the above “parameter competition”, using the foreach
and doParallel
R packages for parallel processing to bring the elapsed time down to reasonable levels:
Post continues after code extract
This weeks’ predictions are…
To turn this model into my tips for this week, I need to extract the final Elo ratings from the best model, join them with the actual fixture and then use the model to predict actual probabilities of winning. Here’s what I get:
home | away | home_elo | away_elo | home_adjustment | away_adjustment | final_prob | winner | fair_returns_home | fair_returns_away |
---|---|---|---|---|---|---|---|---|---|
Richmond | Collingwood | 1623.362 | 1552.782 | 0.0392339 | -0.0305072 | 0.6527701 | Richmond | 1.531933 | 2.879936 |
Sydney | Adelaide | 1506.489 | 1476.328 | 0.0467485 | -0.0632875 | 0.6457876 | Sydney | 1.548497 | 2.823164 |
Essendon | St Kilda | 1507.538 | 1404.657 | 0.0505882 | -0.0428714 | 0.7132448 | Essendon | 1.402043 | 3.487295 |
Port Adelaide | Carlton | 1544.746 | 1340.797 | 0.0571146 | -0.0257644 | 0.8077331 | Port Adelaide | 1.238033 | 5.201104 |
Geelong | Melbourne | 1552.818 | 1521.317 | 0.0747884 | -0.0402286 | 0.6523510 | Geelong | 1.532917 | 2.876464 |
West Coast | GWS | 1543.355 | 1565.494 | 0.0699069 | -0.0648148 | 0.6084577 | West Coast | 1.643499 | 2.554003 |
North Melbourne | Brisbane Lions | 1461.086 | 1483.270 | 0.0406327 | -0.0558655 | 0.5701814 | North Melbourne | 1.753828 | 2.326563 |
Hawthorn | Footscray | 1567.933 | 1482.579 | 0.0490421 | -0.0383632 | 0.6873887 | Hawthorn | 1.454781 | 3.198861 |
Gold Coast | Fremantle | 1382.276 | 1483.175 | 0.0402515 | -0.0729052 | 0.4955912 | Fremantle | 2.017792 | 1.982519 |
That final_prob
column is the estimated probability of the home team winning.
As you can see, I translate my probabilities into a “fair return”, which I’m using to scan for opportunities with poorly chosen odds from the bookies. These opportunities don’t arrive very often as the bookies are professionals, but when they are paying 50% more than the model predicts to be “fair” I’m going to punt $5 and we’ll see how we go at the end of the season. So far I’m $26 up from this strategy but it’s early days and I’m far from assured the luck will continue.
Judging from the tips and odds by the public, the only controversial picks in the above are for North Melbourne to beat Brisbane and Gold Coast to be nearly a coin flip in contest with Fremantle. In both cases my algorithm is tipping a home advantage to equalise the comparative relative strength of the away team. For the North Melbourne match, the bookies agree with me, whereas the tippers on tipping.afl.com.au are going for a Brisbane win, so I think we can say that reasonable people disagree about the outcome there and it is uncertain. For the other match, I have grave doubts about Gold Coast’s chances against Fremantle (who had a stellar victory last weekend), but am inclined to think the $3.50 return bookies are offering to pay for a Gold Coast win is over-generous and underestimating how much Fremantle struggle when playing away from home. So that’s my recommended match to watch for a potential surprise outcome.
At the time of writing, the first two of these predictions in my table above have already gone astray (for me, the average punters and the average tippers) in the Thursday and Friday night matchs, as 2019 continues its run of surprise results. Collingwood and Adelaide both pulled off against-the-odds wins against teams that were both stronger on paper and playing at home. I won’t say my predictions were “wrong”, because when you say something has a 0.6 chance of happening and it doesn’t, there’s a good chance you were just unlucky, not wrong.
But as they say, prediction is hard, particularly about the future.
Final chunk of R code for today – converting the model into predictions for this round:
That’s all.
Here’s the R packages used in producing this post:
maintainer | Number packages | packages |
---|---|---|
Hadley Wickham | 15 | assertthat, dplyr, forcats, ggplot2, gtable, haven, httr, lazyeval, modelr, plyr, rvest, scales, stringr, tidyr, tidyverse |
R Core Team | 12 | base, compiler, datasets, graphics, grDevices, grid, methods, parallel, stats, tools, utils, nlme |
Yihui Xie | 5 | evaluate, highr, knitr, rmarkdown, xfun |
Kirill Müller | 4 | DBI, hms, pillar, tibble |
Winston Chang | 4 | extrafont, extrafontdb, R6, Rttf2pt1 |
Gábor Csárdi | 3 | cli, crayon, pkgconfig |
Jim Hester | 3 | glue, withr, readr |
Lionel Henry | 3 | purrr, rlang, tidyselect |
Rich Calaway | 3 | doParallel, foreach, iterators |
Yixuan Qiu | 3 | showtext, showtextdb, sysfonts |
Dirk Eddelbuettel | 2 | digest, Rcpp |
Jennifer Bryan | 2 | cellranger, readxl |
Jeroen Ooms | 2 | curl, jsonlite |
Simon Urbanek | 2 | audio, Cairo |
Achim Zeileis | 1 | colorspace |
Alex Hayes | 1 | broom |
Brodie Gaslam | 1 | fansi |
Charlotte Wickham | 1 | munsell |
Deepayan Sarkar | 1 | lattice |
James Day | 1 | fitzRoy |
James Hester | 1 | xml2 |
Jeremy Stephens | 1 | yaml |
Joe Cheng | 1 | htmltools |
Justin Talbot | 1 | labeling |
Kamil Slowikowski | 1 | ggrepel |
Kevin Ushey | 1 | rstudioapi |
Luke Tierney | 1 | codetools |
Marek Gagolewski | 1 | stringi |
Matthew Lincoln | 1 | clipr |
Max Kuhn | 1 | generics |
Michel Lang | 1 | backports |
Patrick O. Perry | 1 | utf8 |
Peter Ellis | 1 | frs |
Rasmus Bååth | 1 | beepr |
Simon Garnier | 1 | viridisLite |
Stefan Milton Bache | 1 | magrittr |
Vitalie Spinu | 1 | lubridate |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.