Site icon R-bloggers

One-Sided Matches in the English Premier League

[This article was first published on World Soccer Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

While some English Premier League matches are bitterly even in nature, others are historically more one-sided. Using the engsoccerdata package in R, I looked at EPL results data from 1888 to 2017, and sought to find the answer to the question: what have been the most uneven matches in the English first division?

Observing matches with 30+ games played between the two teams, it seems that the most one-sided English first division match ever is Manchester United – Luton Town, with Manchester United winning over 73% of the 30 fixtures. This is pretty high indeed, and there is only one other fixture with a win rate greater than 70%: Liverpool – QPR, with Liverpool victorious nearly 72% of the time over 46 matches.

An interactive graph can be found at this link, while here is the static version below:

 

And below is the code for the analysis (also on Github). There’s certainly a lot more analysis to be done on this dataset so feel free to use my code to make other interesting insights/visualizations.

Step 1: Initial Pre-Processing

library(dplyr)
library(ggplot2)
library(engsoccerdata)
library(ggiraph)

df <- engsoccerdata::england
#only matches in tier 1 (English First Division and subsequently EPL)
df <- df %>% filter(tier == 1)

#winner of game
df <- df %>% mutate(winner = case_when(
  hgoal > vgoal ~ home,
  hgoal < vgoal ~ visitor,
  TRUE ~ "Draw"
),
loser = case_when(
   hgoal < vgoal ~ home,
   hgoal > vgoal ~ visitor,
   TRUE ~ "Draw"
))
                  
#teams involved
df <- df %>% 
  rowwise %>% 
  mutate(teams_involved = paste(sort(c(home,visitor)),collapse=" - ")) %>% 
  ungroup()

df <- df %>% 
  group_by(teams_involved) %>% 
  mutate(total_games_played = n())

Step 2: Count number of wins per fixture and find top 20 most one sided fixtures

win_count <- df %>% 
  count(winner,
        teams_involved,
        total_games_played) %>% 
  mutate(win_perc = n/total_games_played) %>% 
  ungroup()

more_common_fixtures <- win_count %>% 
  filter(total_games_played>=30)

one_sided_fixtures <- more_common_fixtures %>% 
  ungroup() %>%
  top_n(20,
        wt = win_perc)

Step 3: Graph using ggplot2 and ggiraph::geom_bar_interactive()

top_graph <- one_sided_fixtures %>% 
  ggplot(aes(x = reorder(teams_involved,win_perc),
             y = win_perc,
             tooltip = paste0(winner," (",n," of ",total_games_played,")")))+
  geom_bar_interactive(stat="identity", fill = "darkblue")+
  coord_flip()+
  ylim(0,1)+
  labs(title = "The Top 20 Most One-Sided Matches in English Premier League History",
       y = "Win Percentage",
       x = NULL,
       caption = "Data from engsoccerdata R package") + 
  theme(plot.title = element_text(hjust = 0.5))

 ggiraph(code = print(top_graph),width = 0.8)

To leave a comment for the author, please follow the link and comment on their blog: World Soccer Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.