Probabilistic forecasting for the UEFA Women’s Euro 2022

Achim Zeileis

11 hours ago

[This article was first published on Achim Zeileis, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Using a consensus model based on quoted bookmakers’ odds winning probabilities for all competing teams in the UEFA Women’s Euro are obtained: The favorite is Spain, followed by host England, France, and the Netherlands as the defending champion.

Football fans throughout Europe and the world anticipate the UEFA Women’s Euro 2022 that will take place in England from 6 July to 31 July 2022. 16 of the best European teams compete to determine the new European Champion. Here, a predictive model is established to forecast what the most likely outcome of the tournament will be. The forecast is based on the expert knowledge of 16 bookmakers and betting exchanges using a model averaging approach.

Winning probabilities

The model is the so-called bookmaker consensus model which has been proposed by Leitner, Hornik, and Zeileis (2010, International Journal of Forecasting, https://doi.org/10.1016/j.ijforecast.2009.10.001) and successfully applied in previous football tournaments, either by itself or in combination with even more refined machine learning techniques.

This time the forecast shows that Spain is the favorite with a forecasted winning probability of 19.6%, closely followed by England with a winning probability of 16.6%. Four teams also have double-digit winning probabilities: France with 13.5%, the Netherlands with 13.3%, Germany with 10.3%, and Sweden with 10.1%. More details are displayed in the following barchart.

Interactive full-width graphic

These probabilistic forecasts have been obtained by model-based averaging the quoted winning odds for all teams across bookmakers. More precisely, the odds are first adjusted for the bookmakers’ profit margins (“overrounds”, on average 20.1%), averaged on the log-odds scale to a consensus rating, and then transformed back to winning probabilities. The raw bookmakers’ odds as well as the forecasts for all teams are also available in machine-readable form in weuro2022.csv.

Although forecasting the winning probabilities for the UEFA Women’s Euro 2022 is probably of most interest, the bookmaker consensus forecasts can also be employed to infer team-specific abilities using an “inverse” tournament simulation:

If team abilities are available, pairwise winning probabilities can be derived for each possible match (see below).
Given pairwise winning probabilities, the whole tournament can be easily simulated to see which team proceeds to which stage in the tournament and which team finally wins.
Such a tournament simulation can then be run sufficiently often (here 100,000 times) to obtain relative frequencies for each team winning the tournament.

Using this idea, abilities in step 1 can be chosen such that the simulated winning probabilities in step 3 closely match those from the bookmaker consensus shown above.

Pairwise comparisons

A classical approach to obtain winning probabilities in pairwise comparisons (i.e., matches between teams/players) is the Bradley-Terry model, which is similar to the Elo rating, popular in sports. The Bradley-Terry approach models the probability that a Team A beats a Team B by their associated abilities (or strengths):

< math xmlns="http://www.w3.org/1998/Math/MathML">< mstyle displaystyle="true">< mrow>< mi style="normal">Pr< mo stretchy="false">(< mi>A< mtext> beats < mi>B< mo stretchy="false">)< mo>=< mfrac>< mrow>< msub>< mrow>< mi style="italic">ability< mrow>< mi>A< mrow>< msub>< mrow>< mi style="italic">ability< mrow>< mi>A< mo>+< msub>< mrow>< mi style="italic">ability< mrow>< mi>B< mo>.

Coupled with the “inverse” simulation of the tournament, as described in step 1-3 above, this yields pairwise probabilities for each possible match. The following heatmap shows the probabilistic forecasts for each match with light gray signalling approximately equal chances and green vs. purple signalling advantages for Team A or B, respectively.

Interactive full-width graphic

Performance throughout the tournament

As every single match can be simulated with the pairwise probabilities above, it is also straightfoward to simulate the entire tournament (here: 100,000 times) providing “survival” probabilities for each team across the different stages.

Interactive full-width graphic

For example, this shows that Spain’s chances compared to England and France are lower to reach one of the quarterfinals but higher to reach one of the semifinals. The reasons for this are that Spain plays another one of the strongest six teams in their group (Germany) but can likely avoid another of these six teams in the quarterfinal. Conversely, England and France do not have another of the six top teams in their group but most likely play one in their quarterfinals (Germany and Netherlands or Sweden, respectively).

This effect of the tournament draw is also brought out by another display that highlights the likely flow of all teams through the tournament simultaneously. Compared to the survival curves shown above this visualization brings out more clearly at which stages of the tournament the strong teams are most likely to meet.

Interactive full-width graphic

Odds and ends

The bookmaker consensus model has performed well in previous tournaments, often predicting winners or finalists correctly. However, all forecasts are probabilistic, clearly below 100%, and thus by no means certain. It would also be possible to post-process the bookmaker consensus along with data from historic matches, player ratings, and other information about the teams using machine learning techniques. However, due to lack of time for more refined forecasts at the end of a busy academic year, at least the bookmaker consensus is provided as a solid basic forecast.

As a final remark: Betting on the outcome based on the results presented here is not recommended. Not only because the winning probabilities are clearly far below 100% but, more importantly, because the bookmakers have a sizeable profit margin of about 20.1% which assures that the best chances of making money based on sports betting lie with them!

In a few days we will start learning which of the probable paths through the tournament, shown above, will actually come true. Enjoy the UEFA Women’s Euro 2022!

To leave a comment for the author, please follow the link and comment on their blog: Achim Zeileis.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.