Site icon R-bloggers

Winners at the World Cup – World Soccer Analytics

[This article was first published on worldsocceranalytics.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Winners at the World Cup

The 2018 FIFA World Cup has been over for a month now but the memories are far from fading. It was a fabulous tournament, combining underdog stories, upsets, and some pretty high quality football.The World Cup has been around for 88 years now, featuring 79 national teams and over 900 matches played. Kaggle’s International Football Results dataset provides a list of all international matches, including every World Cup contest. I decided to count the number of wins from each team and visualize it on an animated bar graph. To make it more readable, I only kept countries that had a cumulative win count of 20+. That left us with nine nations and the following plot:And here is the R code:Pre-Processing: Load packages, Load data, Filter to only WC matches, Create “winner” variable, Create “year” variablelibrary(dplyr)
library(ggplot2)
library(readr)
library(lubridate)
library(animation)
library(ggthemes)

df <- read_csv(“results.csv”)

world_cup <- df %>% filter(tournament == “FIFA World Cup”)

world_cup <- world_cup %>%
mutate(winner = ifelse(world_cup$home_score > world_cup$away_score,
world_cup$home_team,
ifelse(world_cup$away_score > world_cup$home_score,
world_cup$away_team,
“Draw”)))

world_cup$year <- world_cup$date %>% year()

 Filter to only teams with greater or equal to 20 wins, Complete rows so that every winner has a row for each year, and Create cumulative-wins variabletop_teams <- world_cup %>% filter(winner %in% c( “Brazil”,
“Germany”,
“Italy”,
“Argentina”,
“France”,
“Spain”,
“England”,
“Netherlands”,
“Uruguay” ))
#complete rows
top_teams <- top_teams %>%
group_by(year, winner) %>%
count() %>%
ungroup() %>%
complete(year,winner, fill = list(n=0))

#create cumulative sum variable, grouped by winner
top_teams <- top_teams %>%
group_by(winner) %>%
mutate(cs=cumsum(n)) Create gif from GGPLOTi <- 1930
saveGIF({
for (i in c(1930,1934,1938,1950,1954,1958,1962,1966,1970,1974,1978,
1982,1986,1990,1994,1998,2002,2006,2010,2014,2018)) {

year_games <- as.character(i)

year_data <- top_teams %>% filter(year == i)

gg <- year_data %>% ggplot(aes(x = winner,
y = cs,
frame = year,
group = winner,
fill = winner)) +
xlim(c(“Brazil”,
“Germany”,
“Italy”,
“Argentina”,
“France”,
“Spain”,
“England”,
“Netherlands” ,
“Uruguay” )) +
ylim(0,75) +
geom_bar(stat = “identity”)+
ggtitle(paste0(“Number of Victories at the FIFA World Cup (1930 – “,
year_games,”)”)) +
scale_colour_brewer(palette = “Set1”) +
labs(x = “”, y = “Cumulative Wins”)+
theme_dark()+
guides(fill=FALSE) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5, face = “bold”, size = 18),
axis.text.x = element_text(face = “bold”, size = 14),
axis.title.y = element_text(face = “bold”, size = 14),
axis.text.y = element_text(face = “bold”, size = 14)) +
scale_fill_manual(values = c(“Brazil” = “yellow1”,
“Germany” = “gray8”,
“Italy” = “#007FFF”,
“Argentina” = “lightblue”,
“France” = “darkblue”,
“Spain”= “darkred”,
“England” = “white”,
“Netherlands” = “darkorange”,
“Uruguay” = “dodgerblue2”))

print(gg)

}
}, movie.name = ‘world_cup_histogram.gif’, interval = 0.8,
ani.width = 1500, ani.height = 900) Code can be found on Github. A big thanks to David Smith at Revolution Analytics, whose blog post helped me a lot in creating the visual.  

Tags: Football, Soccer, World Cup
To leave a comment for the author, please follow the link and comment on their blog: worldsocceranalytics.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version