Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
While some English Premier League matches are bitterly even in nature, others are historically more one-sided. Using the engsoccerdata package in R, I looked at EPL results data from 1888 to 2017, and sought to find the answer to the question: what have been the most uneven matches in the English first division?
Observing matches with 30+ games played between the two teams, it seems that the most one-sided English first division match ever is Manchester United – Luton Town, with Manchester United winning over 73% of the 30 fixtures. This is pretty high indeed, and there is only one other fixture with a win rate greater than 70%: Liverpool – QPR, with Liverpool victorious nearly 72% of the time over 46 matches.
An interactive graph can be found at this link, while here is the static version below:
And below is the code for the analysis (also on Github). There’s certainly a lot more analysis to be done on this dataset so feel free to use my code to make other interesting insights/visualizations.
Step 1: Initial Pre-Processing
library(dplyr) library(ggplot2) library(engsoccerdata) library(ggiraph) df <- engsoccerdata::england #only matches in tier 1 (English First Division and subsequently EPL) df <- df %>% filter(tier == 1) #winner of game df <- df %>% mutate(winner = case_when( hgoal > vgoal ~ home, hgoal < vgoal ~ visitor, TRUE ~ "Draw" ), loser = case_when( hgoal < vgoal ~ home, hgoal > vgoal ~ visitor, TRUE ~ "Draw" )) #teams involved df <- df %>% rowwise %>% mutate(teams_involved = paste(sort(c(home,visitor)),collapse=" - ")) %>% ungroup() df <- df %>% group_by(teams_involved) %>% mutate(total_games_played = n())
Step 2: Count number of wins per fixture and find top 20 most one sided fixtures
win_count <- df %>% count(winner, teams_involved, total_games_played) %>% mutate(win_perc = n/total_games_played) %>% ungroup() more_common_fixtures <- win_count %>% filter(total_games_played>=30) one_sided_fixtures <- more_common_fixtures %>% ungroup() %>% top_n(20, wt = win_perc)
Step 3: Graph using ggplot2 and ggiraph::geom_bar_interactive()
top_graph <- one_sided_fixtures %>% ggplot(aes(x = reorder(teams_involved,win_perc), y = win_perc, tooltip = paste0(winner," (",n," of ",total_games_played,")")))+ geom_bar_interactive(stat="identity", fill = "darkblue")+ coord_flip()+ ylim(0,1)+ labs(title = "The Top 20 Most One-Sided Matches in English Premier League History", y = "Win Percentage", x = NULL, caption = "Data from engsoccerdata R package") + theme(plot.title = element_text(hjust = 0.5)) ggiraph(code = print(top_graph),width = 0.8)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.