Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A working paper describing the data and methods used for our probabilistic UEFA Euro 2020 forecast, published earlier this week, is available now. Additionally, details on the predicted performance of all teams during the group stage are provided.
Overview
Earlier this week we had published our probabilistic UEFA Euro 2020 forecast that combines the expertise of football modelers from four different research teams with the flexibility of machine learning. To explain which data and methods were used exactly, we have also written a working paper and submitted it (as usual) to the arXiv.org e-Print archive. Unfortunately, it is still “on hold” there so that the official release will be delayed a bit. Therefore, in order to make the paper available online prior to the start of the tournament, we make the PDF available along with this blog post.
Moreover, we take the opportunity and provide further insights that can be obtained from our forecast for the results of the group stage, that starts at the end of this week with the opening match between Italy and Turkey in Rome in Group A. More precisely, predicted probabilities for a win, draw, or loss in each of the 36 group stage matches are provided in interactive heatmaps for all groups.
Working paper
Citation:
Groll A, Hvattum LM, Ley C, Popp F, Schauberger G, Van Eetvelde H, Zeileis A (2021). “Hybrid Machine Learning Forecasts for the UEFA EURO 2020.” arXiv.org e-Print archive. [PDF]
Abstract:
Three state-of-the-art statistical ranking methods for forecasting football matches are combined with several other predictors in a hybrid machine learning model. Namely an ability estimate for every team based on historic matches; an ability estimate for every team based on bookmaker consensus; average plus-minus player ratings based on their individual performances in their home clubs and national teams; and further team covariates (e.g., market value, team structure) and country-specific socio-economic factors (population, GDP). The proposed combined approach is used for learning the number of goals scored in the matches from the four previous UEFA EUROs 2004-2016 and then applied to current information to forecast the upcoming UEFA EURO 2020. Based on the resulting estimates, the tournament is simulated repeatedly and winning probabilities are obtained for all teams. A random forest model favors the current World Champion France with a winning probability of 14.8% before England (13.5%) and Spain (12.3%). Additionally, we provide survival probabilities for all teams and at all tournament stages.
Predicted match probabilities for the group stage
Using the hybrid random forest an expected number of goals is obtained for both teams in each possible match in the group stage. As there are typically more goals in the group stage compared to the knockout stage, a different expected number of goals is fitted for the two stages by including a corresponding binary dummy variable in the regression model. While the heatmap shown in our previous blog post contained the probabilities for all possible matches in the knockout stage, we complement this information here by showing different heatmaps for all groups.
The color scheme visualizes the winning probability of the team in the row over the team in the column. Light red or orange vs. dark green or blue signals low vs. high winning probabilities. The tooltips for each match in the interactive version of the graphic also print the probabilities for the match to end in a win, draw, or loss.
Interactive full-width graphics: Group A, Group B, Group C, Group D, Group E, Group F.
Group A | Group B | Group C |
---|---|---|
Group D | Group E | Group F |
---|---|---|
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.