[This article was first published on R – Gradient Metrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Launch Consulting, with an assist from Gradient Metrics, developed a statistical model (have a peek under the hood in this white paper) to assess soccer team strength across all national teams to have ever played a match. Using a comprehensive dataset of all ~40,000 international matches dating back to 1872, data scientists Eric Thompson and Tom Vladeck constructed a multilevel regression model to evaluate team strength. This could be a powerful tool for predicting World Cup matches, and we will try to do just that!
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Predictions
Prior to the tournament, our model predicted 13 of 16 of the final teams, missing only entrants Switzerland and Mexico, as well as Cinderella story Japan. In what many consider a whack and wild World Cup, our model performed quite well. Have a look at the table below to see which teams are rated highest heading into the final matches of the 2018 World Cup.Interesting findings from the model
See “Section 6.2 – World Cup 2018” in our whitepaper to easily evaluate the final teams’ strength. Pre-tournament Vegas favorite Germany stumbled early in its opening 0-1 loss to Mexico and was ultimately eliminated by the South Koreans. While Germany was a favorite to many forecasters, our model found them to be only the #3 best team since 2016, behind Brazil and Spain. Netherlands and Italy must be kicking themselves (pun intended) because, although they didn’t even qualify for the 2018 World Cup, they grade out near the top of our rankings, as the #6 and #7 strongest national teams since 2016 according to our model. What else? Let’s talk about home field advantage for a moment. Bolivia isn’t known for its football prowess, having been outscored by a total of 834-450 in the team’s history. But Bolivia boasts our model’s #1 home-field advantage, at almost double the #2 team. Our model estimates a baseline increase of +0.47 goals scored/game for a typical team’s home-field advantage, but on top of that, Bolivia receives an addition “stadium effect” of +0.42 goals/game. This is by far the largest in soccer, at almost 4.5 standard deviations above the mean. This unique advantage may be tied to the team’s high elevation of play. In 2007 FIFA temporarily banned World Cup qualifying games from being played in Bolivia, Ecuador and Columbia due to high elevations affecting players’ health. Perhaps not surprisingly, other high-elevation teams are littered throughout our ranking of top stadium effects, with Mexico and Ecuador ranked at #4 and #5 respectively.Conclusion
In addition to having some fun, we hope this project opens your eyes to the possibilities of predictive modeling in your own business. Do your customers all purchase with homogeneous frequencies? Of course not. Thus they perhaps need to be modeled as “nested” groups, using a hierarchical approach such as we’ve done here. Give us a shout and we’ll chat! Whoever your team is, good luck / buena suerte / boa sorte / bonne chance / lycka till / sretno / held ogg lykke / がんばろう this weekend. SaveSaveTo leave a comment for the author, please follow the link and comment on their blog: R – Gradient Metrics.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.