Site icon R-bloggers

Were there too many unlikely results at the FIFA World Cup 2022 in Qatar?

[This article was first published on Ilya Kashnitsky, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

FIFA World Cup 2022 in Qatar saw many surprising results. In fact, too many – some would argue. From the unbelievable loss of Argentina to Saudi Arabia at the very beginning of the group stage, via the loss of the magnificent Brazil to Cameroon at the end of the group stage, to the groundbreaking performance of Morocco who were competitive playing against all the usual grands.

Somewhere towards the middle of the group stage I started wondering – is it ordinary to have so many surprises at the World Cup? What if this particular World Cup is really exceptional by the number of unlikely match outcomes? 1 If so, how should I measure it?

  • 1 Through discussions of my earlier postings of this analysis I figured out that the usual term to describe these unlikely outcomes is “upset”. I don’t like the term since it seems to suggest inherently that one always supports the favorite, which is definitely no my case, I tend to cheer for the underdogs. The term I used initially was “sensations”, which apparently makes little sense in English and reveals the direct translation from Russian happening in my head in this case. Throughout this post I’ll use the term “unlikely outcome”.

  • Just a bit of thinking yielded an answer that was really on the surface – bookmakers. They are the people who use all available knowledge to make money on the outcome expectations. Pretty soon I figured out that there is one website [oddsportal.com][oddsportal] that offers long historical data series of betting odds and match outcomes. The real challenge came at the step of scraping the website. Stack Overflow and other similar places are filled with questions about scraping this specific tricky website. Having finally figured out how to do this (after countless trials and failures) I wrapped up the working solution into an R package oddor. The idea is that the package provides both scraping tools and the cleaned extracted datasets. 2

  • 2 For now I only scraped and added several football tournaments and leagues. Please, feel free to add more results via GitHub pull requests.

  • Okay, so the data issue is solved. Now, it’s time for the experiment, a really simple one. I simulate the scenario where I consistently bet on the least likely outcome and track how my fictional balance changes over time. Out of the three possible outcomes of the games – (1) home team wins, (2) draw in main time, and (3) away team wins 3 – I always select the one that promises the highest odds, meaning that this outcome is considered the least likely of the three by the bookmakers.

  • 3 Of course, “home” and “away” designation is very arbitrary at world cups, still I use the terms for consistency.

  • Here are the results for the latest World Cup 2022 (darkest line with all the unlikely outcomes annotated) in comparison with three previous World Cups, 2018, 2014, and 2010.

    Decreasing step lines in the plot represent my decreasing fictional balance. Each game I bet 1 coin on the least likely outcome. Most often I lose this bet, and my balance decreases. Though, sometimes the least likely outcome happens, then my balance increases substantially by the size of the unlikely outcome odds. For example, in case of Argentina losing to Saudi Arabia the odds for this outcome was 25.

    We can see that the 2022 World Cup was really exceptional – too often the outcomes that were considered the least likely happened. It’s also evident that surprises happen more often at the group stage, especially in the third round when many leaders apparently have already reached their group-stage goals (think of the recent game Brazil–Cameroon, where Brazilians literally played with the second team). I would say it’s really surprising – if one bets consistently against the odds at all the group-stage games, at least in the last 4 World Cups (for which we have odds and outcomes data) this dead-simple strategy turns out to be beneficial. My wild guess is that the World Cups see masses of inexperienced new betters who are placing bets on their national teams whatever, which at the global scale is disbalancing the whole system. Alternatively, maybe we are just slow and bad at recognizing how football is becoming more international, and now more underdogs are able to give a decent fight to the traditional grands.

    In contrast, Play-offs are apparently less chaotic and more predictable. Betting on the underdogs at the Play-offs stage would guarantee to lose money in all 4 recent World Cups.

    Of course, the surprising beneficial result of the betting-on-the-underdog experiment at the group-stage games at World Cups made me curious about other competitions. So I did a similar analysis for the Champions League (from season 2004–05) and for English Premier League (from season 2003–04). Predictably, no miracle happened – betting consistently on the underdogs would almost always bring financial losses. So, either World Cups are just different, or all 4 last tournaments were exceptional with the latest being a crazy outlier. Would it be reasonable to try this no-brainer betting strategy at the group-stage of the next World Cup? I’m not sure. But let’s see in 4 years. 4

  • 4 I guess I have to say that this is not a financial advise =)


  • Replicate this analysis using the R code from this gist
    < !-- -->
    To leave a comment for the author, please follow the link and comment on their blog: Ilya Kashnitsky.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.