Football model
[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
After reading Dutch football data (Eeredivisie 2011-2012) and making a predictions display it is time to look at a few simple models to predict goals. To reiterate the data setup, each game played consists of two rows in the data frame. One row for the number of goals the home playing team makes, another row for the away team. We start with four models. Two models I don’t believe in; A zero model where the number of goals is independent of the clubs and everything, model 1 where the number of goals is only dependent on the team making the goals. Two other models are probable. Model 2, both the attacking and the defending team determine the number of goals, finally, model 3, both teams determine the number of goals, but also who is playing at home.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
model0 <- glm(Goals ~ 1,data=StartData,family='poisson')
model1 <- glm(Goals ~OffenseClub,data=StartData,family='poisson')
model2 <- glm(Goals ~OffenseClub + DefenseClub,data=StartData,family='poisson')
model3 <- glm(Goals ~OffenseClub + DefenseClub +
OffThuis,data=StartData,family=’poisson’)
anova (model0,model1,model2,model3,test=’Chisq’)
Analysis of Deviance Table
Model 1: Goals ~ 1
Model 2: Goals ~ OffenseClub
Model 3: Goals ~ OffenseClub + DefenseClub
Model 4: Goals ~ OffenseClub + DefenseClub + OffThuis
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 611 865.23
2 594 754.17 17 111.064 7.610e-16 ***
3 577 699.13 17 55.043 6.743e-06 ***
4 576 668.96 1 30.172 3.955e-08 ***
It appears that modelling step which makes the model more complex is significant, we must reject the hypothesis that any of these terms is not relevant. Hence the number of goals is dependent on the teams plus a home team effect.
The twelfth man
It does make a difference who is playing at home. In practical terms, due to the model used, this advantage is difficult to interpret. In general, when two clubs of equal strength play each other, they each make 1.3 goals.
exp(coef(model2)[1])
(Intercept)
1.346538
When one of these equally strong teams plays away, the other at home, the numbers change. A team playing at home makes 1.6 goals, while playing away only 1.1.
exp(coef(model3)[length(coef(model3))] + coef(model3)[1])
OffThuis
1.58019
exp(coef(model3)[1])
(Intercept)
1.112886
This would make playing away or at home both statistical and practically significant. Note that the size of this effect can not be transferred to other circumstances.
The teams
Each of the teams has two parameters in the model. These can be most easily be interpreted as offensive and defensive power. The following code plots these powers.
co <- coef(model3)
coO <- co[grep('Offense',names(co))]
coD <- co[grep('Defense',names(co))]
names(coO) <- gsub('OffenseClub','',names(coO))
names(coD) <- gsub('DefenseClub','',names(coD))
# Ado Den Haag is missing in the parameterization. so it is added.
coB <- rbind(cbind(coO,coD),matrix(c(0,0)
,nrow=1,,dimnames=list(‘Ado Den Haag’,c(‘coO’,’coD’))))
# scaled for relative strength
coB <- as.data.frame(scale(coB,scale=FALSE))
# -coD to make more defensive power visually larger
plot(-coD ~coO, type=’n’, data=coB,xlab=’Offensive power’,ylab=’Defensive power’,axes=FALSE)
text(-coD ~coO,data=coB,labels=rownames(coB))
abline(a=0,b=1)
abline(v=0)
abline(h=0)
The plot shows the axes, a team close to the centre (NAC Breda, FC Utrecht) was average in both offensive and defensive strength. A diagonal line depicts the equal defense and offense strength region. Hence Feyenoord is equally strong in offense and defense, same for De Graafschap. The line is not quite diagonal, the range in in offense strength is larger than the range in defense strength. The best teams is top right; Ajax. The worst teams are bottom left; De Graafschap and Excelsior have relegated to eerste divisie. A few clubs are noticeable for their mismatch in offensive and defensive strengths. SC Heerenveen has almost the same goal making power as Ajax, but not enough defensive capacity. In contrast, Vitesse won’t receive many goals, but lacks the power to make the goals. Overall they have about the same strength.
Otherwise stated; if SC Heerenveen played against itself. ignoring home team advantage, it would probably make two or even three goals.
fbpredict(model2,’SC Heerenveen’,’SC Heerenveen’)[[1]]
SC Heerenveen in rows against SC Heerenveen in columns
0 1 2 3 4 5 6 7 8 9
0 0.0060 0.0153 0.0196 0.0167 0.0107 0.0055 0.0023 0.0009 0.0003 0.0001
1 0.0153 0.0391 0.0501 0.0428 0.0274 0.0140 0.0060 0.0022 0.0007 0.0002
2 0.0196 0.0501 0.0641 0.0548 0.0351 0.0180 0.0077 0.0028 0.0009 0.0003
3 0.0167 0.0428 0.0548 0.0467 0.0299 0.0153 0.0065 0.0024 0.0008 0.0002
4 0.0107 0.0274 0.0351 0.0299 0.0192 0.0098 0.0042 0.0015 0.0005 0.0001
5 0.0055 0.0140 0.0180 0.0153 0.0098 0.0050 0.0021 0.0008 0.0003 0.0001
6 0.0023 0.0060 0.0077 0.0065 0.0042 0.0021 0.0009 0.0003 0.0001 0
7 0.0009 0.0022 0.0028 0.0024 0.0015 0.0008 0.0003 0.0001 0 0
8 0.0003 0.0007 0.0009 0.0008 0.0005 0.0003 0.0001 0 0 0
9 0.0001 0.0002 0.0003 0.0002 0.0001 0.0001 0 0 0 0
If Vitesse played against itself it would make zero or one goal.
fbpredict(model2,’Vitesse’,’Vitesse’)[[1]]
Vitesse in rows against Vitesse in columns
0 1 2 3 4 5 6 7 8 9
0 0.1165 0.1252 0.0673 0.0241 0.0065 0.0014 0.0002 0 0 0
1 0.1252 0.1346 0.0724 0.0259 0.0070 0.0015 0.0003 0 0 0
2 0.0673 0.0724 0.0389 0.0139 0.0037 0.0008 0.0001 0 0 0
3 0.0241 0.0259 0.0139 0.0050 0.0013 0.0003 0.0001 0 0 0
4 0.0065 0.0070 0.0037 0.0013 0.0004 0.0001 0 0 0 0
5 0.0014 0.0015 0.0008 0.0003 0.0001 0 0 0 0 0
6 0.0002 0.0003 0.0001 0.0001 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
model extensions
The Residual deviance of model3 is 668.96 on 576 degrees of freedom. That might mean some more effects can be found in the data.
twelfth man and teams
The first extension is that home and away advantage is different between teams. Based on these data, this does not seem to be statistically significant.
model4a <- glm(Goals ~OffenseClub*OffThuis + DefenseClub
,data=StartData,family=’poisson’)
model4b <- glm(Goals ~OffenseClub + DefenseClub*OffThuis
,data=StartData,family=’poisson’)
model5 <- glm(Goals ~(OffenseClub + DefenseClub)*OffThuis
,data=StartData,family=’poisson’)
anova (model3,model4a,model5,test=’Chisq’)
Analysis of Deviance Table
Model 1: Goals ~ OffenseClub + DefenseClub + OffThuis
Model 2: Goals ~ OffenseClub * OffThuis + DefenseClub
Model 3: Goals ~ (OffenseClub + DefenseClub) * OffThuis
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 576 668.96
2 559 649.00 17 19.953 0.2766
3 542 626.77 17 22.236 0.1758
anova (model3,model4b,model5,test=’Chisq’)
Analysis of Deviance Table
Model 1: Goals ~ OffenseClub + DefenseClub + OffThuis
Model 2: Goals ~ OffenseClub + DefenseClub * OffThuis
Model 3: Goals ~ (OffenseClub + DefenseClub) * OffThuis
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 576 668.96
2 559 647.46 17 21.499 0.2048
3 542 626.77 17 20.690 0.2404
Before and after winter break
Winter break has the possibility to change players. It might be, that teams change in quality in this period. In these data, it seems this effect is not statistically significant.
StartData$year <- factor(c(substr(old$Datum,1,4),substr(old$Datum,1,4)))
model6 <- glm(Goals ~OffenseClub + DefenseClub + year + OffThuis
,data=StartData,family=’poisson’)
model7 <- glm(Goals ~(OffenseClub + DefenseClub)*year + OffThuis
,data=StartData,family=’poisson’)
anova (model3,model6,model7,test=’Chisq’)
Analysis of Deviance Table
Model 1: Goals ~ OffenseClub + DefenseClub + OffThuis
Model 2: Goals ~ OffenseClub + DefenseClub + year + OffThuis
Model 3: Goals ~ (OffenseClub + DefenseClub) * year + OffThuis
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 576 668.96
2 575 668.82 1 0.135 0.7129
3 541 625.48 34 43.345 0.1308
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.