Site icon R-bloggers

The UEFA EURO 2020 prediction winner is …

[This article was first published on R blog posts on sandsynligvis.dk, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

… going to be revealed just below. This post from June 2nd shows the original announcement.

Each contestant was asked to submit a prediction that should be a 6 x 24 matrix where the columns represent the countries, and the rows represent the possible ranks obtained after the tournament is over. The entries in the matrix should be numbers between 0 and 1 and represent probabilities that a given country will end up at a given rank. Consequently, the columns will sum to 1 and the rows will sum to the number of teams that will end up for each rank.

The participants

Nine participants provided submissions and they did not have to explain how they arrived at their predictions. The predictions could be based on gut feeling, reading tea leaves or complex statistical models.

Figur 1: Initial prediction of the team that will win the UEFA EURO 2020 tournament. One contestant was absolutely certain that Belgium would win.

And the winner is …

The prediction winner will be the participant who provided a prediction that will return the lowest Tournament Rank Prediction Score as proposed in Evaluating one-shot tournament predictions by Ekstrøm, Van Eetvelde, Ley and Brefeld. The socceR package on CRAN will be used for the computation and smaller numbers indicate better predictions.

I’ve assigned some names to the entries based on the GitHub uploads. I do apologize if the names are not adequately covering the underlying prediction approach.

# The object pred_list contains a list of matrix predictions
# Tournament outcome. Ranks
outcome <- c(6, 1, 5, 4,
             3, 6, 4, 6,
             5, 4, 5, 6,
             2, 5, 6, 4,
             3, 5, 6, 6,
             6, 5, 5, 5)

data.frame(team=colnames(pred_list[[1]]), rank=outcome)
              team rank
1           Turkey    6
2            Italy    1
3            Wales    5
4      Switzerland    4
5          Denmark    3
6          Finland    6
7          Belgium    4
8           Russia    6
9      Netherlands    5
10         Ukraine    4
11         Austria    5
12 North Macedonia    6
13         England    2
14         Croatia    5
15        Scotland    6
16  Czech Republic    4
17           Spain    3
18          Sweden    5
19          Poland    6
20        Slovakia    6
21         Hungary    6
22        Portugal    5
23          France    5
24         Germany    5
result <- sapply(pred_list, function(i) { socceR::trps(i, outcome)})
Tabel 1: Tournament rank probaility score for the 9 entries in the tournament prediction competition. Smaller numbers indicate better predictions.
TRPS
CD 0.1478
Current strength 0.1011
Brandt (ELO) 0.0942
Mads 0.1025
Random forest 0.1045
XGBoost 0.1089
Simple 0.1667
FK 0.1319
Bookmaker consensus 0.1066

For comparison, a completely flat prediction where each team was assigned the same probability of each rank would yield a TRPS of 0.1399. Note that this completely uninformative prediction performs better than two of the entries: Simple and CD. Especially the Simple entry performed spectacularly bad and had a tournament rank predictions score that was markedly larger than a completely flat prediction. Confident prediction that are wrong are being penalized heavily!

Congratulations to Lennart Brandt who provided a prediction based on the teams’ ELO rating and had the best overall tournament prediction.

To leave a comment for the author, please follow the link and comment on their blog: R blog posts on sandsynligvis.dk.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.