Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Following the Irish rugby team’s humiliating 60-0 defeat to New Zealand, an interesting question was posed on Twitter: what does a 60-0 result convert to in football/soccer?
Intrigued, I decided to gather some data from both the English premier league (this season, more data collected and future blog posts to come!) and the equivalent English league in rugby union (this season too). My solution to this question involved the following steps. Firstly, I assumed that the scoring process in both games follow parametric probability distributions. I then fitted these data to these distributions, and calculated both the distribution and quantile functions. This allowed me to see the probability of a team scoring 60 in rugby, and then convert that probability into football goals.
The scores in both games will take the form of some kind of count distribution. However, Rugby is a much higher scoring game, and it is unlikely that both of the score count processes are being generated from the same parametric distribution. To fit scores from both games to their respective distributions, I have chosen to use the gamlss package on CRAN. The advantage of the gamlss package is that it has the capability to fit a huge range of distributions.
The code below shows how I loaded these data and fit the scores for both football and rugby to a number of count related distributions. My final choice of distribution was based on a comparison of AIC values for each model. Based on these, football and rugby scores follow the Poisson-inverse Gaussian, and zero-adjusted and zero-inflated negative binomial distributions respectively. The pZANBI and qPIG functions calculate the location of a rugby score on the football score distribution.
To answer the question: a 60-0 score in rugby translates into a 7-0 score in football. Oh dear.
#### score analysis rm(list=ls()) p1 <- read.csv("premgames.csv") sc <- c(p1$hgoal,p1$agoal) # sc is premier league goals library(gamlss.dist) library(gamlss) # fit dists m1a <- gamlss(sc ~ 1, family = PO) m2a <- gamlss(sc ~ 1, family = NBI) m3a <- gamlss(sc ~ 1, family = NBII) m4a <- gamlss(sc ~ 1, family = PIG) m5a <- gamlss(sc ~ 1, family = ZANBI) m6a <- gamlss(sc ~ 1, family = ZIPIG) m7a <- gamlss(sc ~ 1, family = SI) # compare dists AIC(m1a,m2a,m3a,m4a,m5a,m6a,m7a) # m4a is the best #load rugby data p2 <- as.character(unlist(read.csv("rugscore.csv"))) nms <- names(table(p2))[2:47] p3 <- p2[p2 %in% nms] p4 <- as.numeric(as.character(p3)) #fit m1b <- gamlss(p4 ~ 1, family = PO) m2b <- gamlss(p4 ~ 1, family = NBI) m3b <- gamlss(p4 ~ 1, family = NBII) m4b <- gamlss(p4 ~ 1, family = PIG) m5b <- gamlss(p4 ~ 1, family = ZANBI) m6b <- gamlss(p4 ~ 1, family = ZIPIG) m7b <- gamlss(p4 ~ 1, family = SI) #compare AIC(m1b,m2b,m3b,m4b,m5b,m6b,m7b) #m5b is best # p of 60 in rugby s1 <- pZANBI(60, mu = exp(m5b$mu.coefficients), sigma = exp(m5b$sigma.coefficients), nu = exp(m5b$nu.coefficients)) # convert p in rugby to soccer distribution qPIG(s1, mu = exp(m4a$mu.coefficients), sigma = exp(m4a$sigma.coefficients)) # the same again for zero s2 <- pZANBI(0, mu = exp(m5b$mu.coefficients), sigma = exp(m5b$sigma.coefficients), nu = exp(m5b$nu.coefficients)) qPIG(s2, mu = exp(m4a$mu.coefficients), sigma = exp(m4a$sigma.coefficients)) ############################################################# ########## output > rm(list=ls()) > p1 <- read.csv("premgames.csv") > sc <- c(p1$hgoal,p1$agoal) > # sc is premier league goals > > library(gamlss.dist) > library(gamlss) > > # fit dists > m1a <- gamlss(sc ~ 1, family = PO) > m2a <- gamlss(sc ~ 1, family = NBI) > m3a <- gamlss(sc ~ 1, family = NBII) > m4a <- gamlss(sc ~ 1, family = PIG) > m5a <- gamlss(sc ~ 1, family = ZANBI) > m6a <- gamlss(sc ~ 1, family = ZIPIG) > m7a <- gamlss(sc ~ 1, family = SI) > > # compare dists > AIC(m1a,m2a,m3a,m4a,m5a,m6a,m7a) df AIC m4a 2 2334.244 m2a 2 2334.412 m3a 2 2334.412 m6a 3 2336.244 m7a 3 2336.244 m5a 3 2336.328 m1a 1 2341.862 > # m4a is the best > > #load rugby data > p2 <- as.character(unlist(read.csv("rugscore.csv"))) > nms <- names(table(p2))[2:47] > p3 <- p2[p2 %in% nms] > p4 <- as.numeric(as.character(p3)) > > #fit > m1b <- gamlss(p4 ~ 1, family = PO) > m2b <- gamlss(p4 ~ 1, family = NBI) > m3b <- gamlss(p4 ~ 1, family = NBII) > m4b <- gamlss(p4 ~ 1, family = PIG) > m5b <- gamlss(p4 ~ 1, family = ZANBI) > m6b <- gamlss(p4 ~ 1, family = ZIPIG) > m7b <- gamlss(p4 ~ 1, family = SI) > > #compare > AIC(m1b,m2b,m3b,m4b,m5b,m6b,m7b) df AIC m5b 3 1721.183 m2b 2 1722.700 m3b 2 1722.700 m6b 3 1727.345 m4b 2 1732.172 m7b 3 1749.975 m1b 1 2265.146 > #m5b is best > > # p of 60 in rugby > s1 <- pZANBI(60, mu = exp(m5b$mu.coefficients), sigma = exp(m5b$sigma.coefficients), + nu = exp(m5b$nu.coefficients)) > # convert p in rugby to soccer distribution > qPIG(s1, mu = exp(m4a$mu.coefficients), sigma = exp(m4a$sigma.coefficients)) [1] 7 > > # the same again for zero > s2 <- pZANBI(0, mu = exp(m5b$mu.coefficients), sigma = exp(m5b$sigma.coefficients), + nu = exp(m5b$nu.coefficients)) > qPIG(s2, mu = exp(m4a$mu.coefficients), sigma = exp(m4a$sigma.coefficients)) [1] 0
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.