New Bundesliga Forecasting Tool: Can Underdog Herta Berlin beat Bayern Munich?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The Bundesliga is Germany’s primary football league. It is one of the most important football leagues in the world, broadcast on television in over 200 countries.
If you want to get your hands on a tool to forecast the result of any game (and perform some more statistical analyses), read on!
The basis of our forecasting tool was laid in this blog post: Euro 2020: Will Switzerland kick out Spain too?. There we also explained the methodology. For this post, we adapted the parameters for the Bundesliga (the sources are given in the code below) to forecast the result of the upcoming game Herta BSC (Berlin) against the international top team Bayern Munich on August 28 as an example. The tool can also easily be adapted to other football leagues, e.g. the English Premier League.
On top of that, we made the model even more accurate by adding a home advantage. This effect is surprisingly stable across the main European football leagues at about 0.4 goals extra for the home team. By the way: in times of Corona, when no spectators were allowed in the stadiums, the home advantage disappeared!
Another thing we added is a probability calculation for all possible outcomes. We do this by assuming that the goals scored for each team are independent of each other (it can be discussed whether this is a reasonable assumption) so that all marginal probabilities can just be multiplied. This can easily be done in R with the outer()
(product) function (= %o%
). The most probable outcome can then easily be extracted:
mean_total_score <- 3.03 # https://de.statista.com/statistik/daten/studie/1622/umfrage/bundesliga-entwicklung-der-durchschnittlich-erzielten-tore-pro-spiel/ # https://www.transfermarkt.de/bundesliga/marktwerteverein/wettbewerb/L1 team1 = "Bayern Munich" ; colour1 <- "red" ; value1 <- 818.5 # rows team2 = "Herta BSC" ; colour2 <- "blue" ; value2 <- 176.75 # columns # https://www.saechsische.de/mehr-auswaerts-tore-bei-geisterspielen-5219318.html ratio <- value1 / (value1 + value2) mean_goals1 <- ratio * mean_total_score + 0.4 # 0.4 = home advantage mean_goals2 <- (1 - ratio) * mean_total_score - 0.4 goals <- 0:7 prob_goals1 <- dpois(goals, mean_goals1) prob_goals2 <- dpois(goals, mean_goals2) probs <- round((prob_goals1 %o% prob_goals2) * 100, 1) # outer product colnames(probs) <- rownames(probs) <- goals parbkp <- par(mfrow=c(1, 2)) max_ylim <- max(prob_goals1, prob_goals2) plot(goals, prob_goals1, type = "h", ylim = c(0, max_ylim), xlab = team1, ylab = "Probability", col = colour1, lwd = 10) plot(goals, prob_goals2, type = "h", ylim = c(0, max_ylim), xlab = team2, ylab = "", col = colour2, lwd = 10) title(paste(team1, paste(goals[which(probs == max(probs), arr.ind = TRUE)], collapse = ":"), team2), line = -2, outer = TRUE) par(parbkp)
So, the most probable outcome will be Bayern Munich 2:0 Herta BSC. Let us have a look at the probabilities in more detail:
probs ## 0 1 2 3 4 5 6 7 ## 0 4.8 0.7 0.0 0 0 0 0 0 ## 1 14.0 1.9 0.1 0 0 0 0 0 ## 2 20.2 2.8 0.2 0 0 0 0 0 ## 3 19.5 2.7 0.2 0 0 0 0 0 ## 4 14.1 1.9 0.1 0 0 0 0 0 ## 5 8.1 1.1 0.1 0 0 0 0 0 ## 6 3.9 0.5 0.0 0 0 0 0 0 ## 7 1.6 0.2 0.0 0 0 0 0 0
The number of goals of Bayern Munich is in the rows, Herta BSC is in the columns. The 2:0 result has a probability of over twenty percent, which is quite high. But even a result of 3:0 still has a probability of nearly 20 percent!
To calculate the overall probabilities for a win for each team and a draw we can conveniently use the lower.tri()
, upper.tri()
, and diag()
functions:
sum(probs[lower.tri(probs)]) # probability team 1 wins ## [1] 91 sum(diag(probs)) # probability for a draw ## [1] 6.9 sum(probs[upper.tri(probs)]) # probability team 2 wins ## [1] 0.8
So, to answer the original question, Herta BSC’s chance to beat Bayern Munich is below one percent: they need nothing less than a miracle to win in Munich!
DISCLAIMER
This post is written on an “as is” basis for educational purposes only and comes without any warranty. The findings and interpretations are exclusively those of the author and are not endorsed by or affiliated with any third party.
In particular, this post provides no sports betting advice! No responsibility is taken whatsoever if you lose money.
(If you gain money though I would be happy if you would buy me a coffee… that is not too much to ask, is it? )
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.