Soccer Analytics for Beginners: An R Tutorial on EURO 2020 Data – Web Scraping & Radar Plots
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Finally, a day off from EURO 2020 action! A day we could just sit back and relax. I enjoyed seeing the Colombian Luis Diaz’ amazing goal against Brazil and Seleção’s controversial come back. I stayed up to watch the Atlanta Hawks win another Game 1, this time against Giannis and the Bucks in the Eastern Conference Finals. Fun times.
We have received a few requests from sports analytics enthusiasts through our Instagram and Facebook pages for guides. Below is a written guide on (a) scraping data from fbref.com, (b) manipulating the data for analysis, (c) creating radar plots.
Disclaimers:
- I started writing this at around midnight and it took a few hours. I woke up at 6 AM to the sound of my 10-week-old daughter trying to sing what I am assuming was an Iron Maiden song from the late 80s. That means I finished writing this while tired, but fully loaded with caffeine and a good mood. Please mind any mistakes, typos, and the use of GIFs.
- I am not a professor. I am not a computer scientist. I do not have much experience in teaching. I am a passionate sports fan with a love for data. Yes, I do have an expertise in data science. But, some code might be messy. Some code can be improved. Being results-oriented, I only care that it works. So, thanks for being considerate!
- This is my first ever tutorial so please provide some feedback. Feel free to contact us!
- There are a few affiliate links throughout the post leading to some cool products I like and have bought myself.
For a visual walkthrough, check out our video here. It came out a bit longer than expected. It has some extra detail and explanations.
Let’s dig in!
Step 1: Download R Studio
The debate on which programming language is best for data science has been going on for a while. R and Python are the main choices. Both are awesome and it’s rather a matter of preference, as well as what kind of projects you have in mind.
That being said, having a statistical background, I have opted to use R. So, first step, if you have not done so, download the latest version of R and R Studio from the links below.
https://www.rstudio.com/products/rstudio/download/
Step 2: Install packages
R has A LOT of packages you can use. Let’s start by installing the ones we use.
Open R Studio and run the below commands one by one. When installing the package “colorspace”, type “no” and Enter if prompted.
##### # Step 2: Install packages ##### # For the below 2 commands, if prompted, type "no" and Enter install.packages("colorspace") install.packages("curl") install.packages("BasketballAnalyzeR") install.packages("ggplot2") install.packages("htmltab") install.packages("stringr") install.packages("dplyr") install.packages("gridExtra") install.packages("cowplot")
Once you install the above packages once, you will no longer need to install them on your system.
Step 3: Load libraries
Run the below commands to load the libraries we use.
##### # Step 3: Load libraries ##### library(curl) library(BasketballAnalyzeR) library(ggplot2) library(htmltab) library(stringr) library(dplyr) library(gridExtra) library(cowplot)
Step 4: Read fbref.com URLs
All data in this tutorial is from the free resource fbref.com. It’s a great place for statistics and historical data. I really appreciate the work these folks have done. Have a look to see what’s available
Run the below code.
##### # Step 4: Read fbref.com URLs ##### # Group A url1 <- "https://fbref.com/en/matches/caa84313/Italy-Switzerland-June-16-2021-UEFA-Euro" url2 <- "https://fbref.com/en/matches/95a9ebd1/Turkey-Italy-June-11-2021-UEFA-Euro" url3 <- "https://fbref.com/en/matches/f09b64db/Turkey-Wales-June-16-2021-UEFA-Euro" url4 <- "https://fbref.com/en/matches/d9eaa85c/Wales-Switzerland-June-12-2021-UEFA-Euro" url5 <- "https://fbref.com/en/matches/b756c626/Italy-Wales-June-20-2021-UEFA-Euro" url6 <- "https://fbref.com/en/matches/fa85a731/Switzerland-Turkey-June-20-2021-UEFA-Euro" url_group_A <- rbind(url1, url2, url3, url4, url5, url6) # Group B url7 <- "https://fbref.com/en/matches/e594174b/Belgium-Russia-June-12-2021-UEFA-Euro" url8 <- "https://fbref.com/en/matches/25bb1fa2/Denmark-Belgium-June-17-2021-UEFA-Euro" url9 <- "https://fbref.com/en/matches/2c48acb2/Finland-Russia-June-16-2021-UEFA-Euro" url10 <- "https://fbref.com/en/matches/c3c2ffa2/Denmark-Finland-June-12-2021-UEFA-Euro" url11 <- "https://fbref.com/en/matches/bd35edec/Finland-Belgium-June-21-2021-UEFA-Euro" url12 <- "https://fbref.com/en/matches/04188c5c/Russia-Denmark-June-21-2021-UEFA-Euro" url_group_B <- rbind(url7, url8, url9, url10, url11, url12) # Group C url13 <- "https://fbref.com/en/matches/f3d39a29/Netherlands-Austria-June-17-2021-UEFA-Euro" url14 <- "https://fbref.com/en/matches/b47a0ea6/Austria-North-Macedonia-June-13-2021-UEFA-Euro" url15 <- "https://fbref.com/en/matches/e0eed6e8/Ukraine-North-Macedonia-June-17-2021-UEFA-Euro" url16 <- "https://fbref.com/en/matches/0e9919a5/Netherlands-Ukraine-June-13-2021-UEFA-Euro" url17 <- "https://fbref.com/en/matches/841065f5/North-Macedonia-Netherlands-June-21-2021-UEFA-Euro" url18 <- "https://fbref.com/en/matches/7ed46abd/Ukraine-Austria-June-21-2021-UEFA-Euro" url_group_C <- rbind(url13, url14, url15, url16, url17, url18) # Group D url19 <- "https://fbref.com/en/matches/6599f4ab/Scotland-Czech-Republic-June-14-2021-UEFA-Euro" url20 <- "https://fbref.com/en/matches/1e930db9/Croatia-Czech-Republic-June-18-2021-UEFA-Euro" url21 <- "https://fbref.com/en/matches/764c27dc/England-Croatia-June-13-2021-UEFA-Euro" url22 <- "https://fbref.com/en/matches/027b11df/England-Scotland-June-18-2021-UEFA-Euro" url23 <- "https://fbref.com/en/matches/20b1972b/Czech-Republic-England-June-22-2021-UEFA-Euro" url24 <- "https://fbref.com/en/matches/0305e42c/Croatia-Scotland-June-22-2021-UEFA-Euro" url_group_D <- rbind(url19, url20, url21, url22, url23, url24) # Group E url25 <- "https://fbref.com/en/matches/107fd412/Spain-Sweden-June-14-2021-UEFA-Euro" url26 <- "https://fbref.com/en/matches/d35ad7a8/Poland-Slovakia-June-14-2021-UEFA-Euro" url27 <- "https://fbref.com/en/matches/c6533f76/Sweden-Slovakia-June-18-2021-UEFA-Euro" url28 <- "https://fbref.com/en/matches/14874531/Spain-Poland-June-19-2021-UEFA-Euro" url29 <- "https://fbref.com/en/matches/ee6087f4/Sweden-Poland-June-23-2021-UEFA-Euro" url30 <- "https://fbref.com/en/matches/7b46b857/Slovakia-Spain-June-23-2021-UEFA-Euro" url_group_E <- rbind(url25, url26, url27, url28, url29, url30) # Group F url31 <- "https://fbref.com/en/matches/95d34c87/France-Germany-June-15-2021-UEFA-Euro" url32 <- "https://fbref.com/en/matches/ba500d70/Hungary-Portugal-June-15-2021-UEFA-Euro" url33 <- "https://fbref.com/en/matches/988198ba/Hungary-France-June-19-2021-UEFA-Euro" url34 <- "https://fbref.com/en/matches/e33c4403/Portugal-Germany-June-19-2021-UEFA-Euro" url35 <- "https://fbref.com/en/matches/5a7e53d8/Portugal-France-June-23-2021-UEFA-Euro" url36 <- "https://fbref.com/en/matches/a4888546/Germany-Hungary-June-23-2021-UEFA-Euro" url_group_F <- rbind(url31, url32, url33, url34, url35, url36)
Step 5: Read a single pair of tables for a single game
I will now read two single tables, the summary stats of Portugal players and the summary stats of France players for the game between them on June 23rd. These are HTML tables so I use the “htmltab” command, which requires a URL and a node.
##### # Step 5: Read a single pair of tables for a single game ##### # Choose a game from the list of URLs from the previous step selected_game <- url35 # Some data manipulation to get the date and teams from the URLs game_data <- substr(selected_game, 39, nchar(selected_game)-10) date <- substr(game_data, nchar(game_data)-11, nchar(game_data)) teams <- substr(game_data, 1, nchar(game_data)-13) teams <- str_replace(teams, "Czech-Republic", "Czech Republic") teams <- str_replace(teams, "North-Macedonia", "North Macedonia") teamA <- sub("-.*", "", teams) teamB <- sub(".*-", "", teams) #define the node node <- "#stats_b561dd30_defense" #add the node to the URL url <- paste0(url35, node) #read first table and add the date and teams statA <- htmltab(doc = url, which = 4, rm_nodata_cols = F) statA <- cbind(date, Team=teamA, Opponent=teamB, statA) #read second table and add the date and teams statB <- htmltab(doc = url, which = 11, rm_nodata_cols = F) statB <- cbind(date, Team=teamB, Opponent=teamA, statB) #combine the two table rows stat_both <- rbind(statA, statB) stat_both$Player <- str_trim(stat_both$Player, side = c("both", "left", "right"))
Let’s have a look at our table.
View(stat_both)
Step 6: Read all tables for all games
Now that we’ve seen how to get data for one game and one type of table, let’s get data for ALL games and ALL tables. Yes, I want it all.
##### # Step 6: Read all tables for all games ##### #combine all game URLs for all groups selected_urls <- rbind(url_group_A, url_group_B, url_group_C, url_group_D, url_group_E, url_group_F) #initialize tables all_stat <- NULL full_stat <- NULL for (g in 1:length(selected_urls)){ # Get the game info from the URL game_data <- substr(selected_urls[g], 39, nchar(selected_urls[g])-10) date <- substr(game_data, nchar(game_data)-11, nchar(game_data)) teams <- substr(game_data, 1, nchar(game_data)-13) teams <- str_replace(teams, "Czech-Republic", "Czech Republic") teams <- str_replace(teams, "North-Macedonia", "North Macedonia") teamA <- sub("-.*", "", teams) teamB <- sub(".*-", "", teams) #read the first pair of tables node <- "#stats_b561dd30_defense" url <- paste0(selected_urls[g], node) statA <- htmltab(doc = url, which = 4, rm_nodata_cols = F) statA <- cbind(date, Team=teamA, Opponent=teamB, statA) statB <- htmltab(doc = url, which = 11, rm_nodata_cols = F) statB <- cbind(date, Team=teamB, Opponent=teamA, statB) stat_both <- rbind(statA, statB) all_stat <- stat_both #define the game's data frame all_stat <- stat_both #loop for all tables related to the game for(i in 5:9){ game_data <- substr(selected_urls[g], 39, nchar(selected_urls[g])-10) date <- substr(game_data, nchar(game_data)-11, nchar(game_data)) teams <- substr(game_data, 1, nchar(game_data)-13) teams <- str_replace(teams, "Czech-Republic", "Czech Republic") teams <- str_replace(teams, "North-Macedonia", "North Macedonia") teamA <- sub("-.*", "", teams) teamB <- sub(".*-", "", teams) node <- "#stats_b561dd30_defense" url <- paste0(selected_urls[g],node) statA <- htmltab(doc = url, which = i, rm_nodata_cols = F) statA <- cbind(date, Team=teamA, Opponent=teamB, statA) statB <- htmltab(doc = url, which = i+7, rm_nodata_cols = F) statB <- cbind(date, Team=teamB, Opponent=teamA, statB) stat_both <- rbind(statA, statB) all_stat <- merge(all_stat, stat_both, by="Player") } #add the game tables to the total data frame full_stat <- rbind(full_stat, all_stat) } #remove any duplicates all_stat_full <- unique(full_stat) #convert all stats into numeric variables all_stat_full <- cbind(all_stat_full[,1:4], mutate_all(all_stat_full[,5:ncol(all_stat_full)], function(x) as.numeric(as.character(x)))) #export the table to CSV write.csv(all_stat_full,"all_stat_full.csv")
You can access the file here.
Step 7: Create summary data frame
The core of data exploration: the pivot table. In R we do this with the help of the dplyr package. We take the data frame we have, group the data by the player, and we summarise the stats by summing them.
I always like viewing the table after pivoting.
##### # Step 7: Create summary data frame ##### #remove some unwanted columns all_stat_full$Pos.x <- NULL all_stat_full$Age.x <- NULL all_stat_full$`#.x` <- NULL all_stat_full$Pos.x <- NULL all_stat_full$Age.x <- NULL all_stat_full$`#.x` <- NULL #Sum all stats for each player all_stat_full <- all_stat_full %>% group_by(Player) %>% summarise_each(list(sum)) View(all_stat_full)
Step 8: Select players
We all have our favorite players as well as the ones that catch our attention, for good or bad reasons. As a New York Knicks fan, I recently developed a disliking for Trae Young and watch his stats closely. As a person that has bet (and lost) that “football’s coming home” for 7 straight major international tournaments, I enjoy looking at England stats.
Below I have selected 8 players that have been on the spotlight so far in these Euros.
#Look at the available player names. View(unique(all_stat_full$Player)) #Select the players you want to see. Choose 8 players for better visual results. selected_players <- subset(all_stat_full, Player=="Kylian Mbappé" | Player=="Antoine Griezmann" | Player=="Harry Kane" | Player=="Kai Havertz" | Player=="Cristiano Ronaldo" | Player=="Álvaro Morata" | Player=="Memphis Depay" | Player=="Patrik Schick")
Step 9: Create the radar plots
As you may know we’ve been doing a bunch of basketball analytics. I can’t stress how lucky I am to have come across the great book Basketball Data Science with Applications in R. Anyone interested in basketball analytics should definitely get their hands on a copy. The BasketballAnalyzeR R package is simply amazing.
One of the cool things the authors and creators of the BasketballAnalyzeR R package have created is a radar plot format. So I apply a function intended for basketball analytics to soccer. Why not?
##### # Step 9: Create the radar plots ##### #attach the dataset attach(selected_players) #select the statistics we want to see and prepare for the plot Sel <- data.frame("xG"=`Expected >> xG`, "Dr"=`Dribbles >> Succ`, "Pass"=`Passes >> Cmp`, "Sh"=`Performance >> Sh`, "SoT"=`Performance >> SoT`, "KP"=`KP`) Sel <- mutate_all(Sel, function(x) as.numeric(as.character(x))) #run the radialprofile function with std=T, which standardizes the data so that the scale looks normal p <- radialprofile(data=Sel, title=selected_players$Player, std=T) detach(selected_players)
Step 10: Make the graph presentable
Let’s reformat the graph, add titles and captions, and save it to our computer.
##### # Step 10: Make the graph presentable ##### g <- grid.arrange(grobs=p[1:length(p)], ncol=3) g2 <- cowplot::ggdraw(g)+theme_grey()+ labs(title="Selected Players Radar Plots", subtitle="Data from fbref.com. Aggregated data from EURO 2020 Group Stage Matches. Stat values are standardized (μ=0, sd=1", caption = "@Sweep_SportsAnalytics") g2 ggsave("radar-plot.png", w = 7.5, h = 7.5, dpi = 400) #create a table with descriptions for the stats we chose descriptions <- data.frame( "Category"=colnames(Sel), "Description"=c("Expected Goals", "Successful Dribbles", "Completed Passes", "Shots", "Shots on Target", "Key Passes")) library(kableExtra) library(magick) descr <- tableGrob(print(descriptions, row.names = F)) g_final <- g2 + annotation_custom(descr, xmin = 0.75, xmax = 0.85, ymin = 0.1, ymax = 0.2) + coord_cartesian(clip = "off") ggsave("radar-key-final.png", w = 7.5, h = 7.5, dpi = 400)
Step 11: Interpret the graph
Data analysis doesn’t mean much if you can’t answer the basic question: “So what?”
Interpreting your findings is the key to any analytics. Keep in mind that sports analytics have been around for over a decade, but you don’t see many data nerds managing a team. The best managers and sports personnel know what to do with the results of the analysis. They have a deep understanding of the game, and that’s most important.
I would love to see your interpretation of any players and stats you analyze! An idea for you: change the URLs to Copa America matches and select some of the stars.
Feel free to share your findings in the comments below or on our Instagram or Facebook pages.
Full Code Below
##### # Step 2: Install packages ##### # For the below 2 commands, if prompted, type "no" and Enter install.packages("colorspace") install.packages("curl") install.packages("BasketballAnalyzeR") install.packages("ggplot2") install.packages("htmltab") install.packages("stringr") install.packages("dplyr") install.packages("gridExtra") install.packages("cowplot") ##### # Step 3: Load libraries ##### library(curl) library(BasketballAnalyzeR) library(ggplot2) library(htmltab) library(stringr) library(dplyr) library(gridExtra) library(cowplot) ##### # Step 4: Read fbref.com URLs ##### # Group A url1 <- "https://fbref.com/en/matches/caa84313/Italy-Switzerland-June-16-2021-UEFA-Euro" url2 <- "https://fbref.com/en/matches/95a9ebd1/Turkey-Italy-June-11-2021-UEFA-Euro" url3 <- "https://fbref.com/en/matches/f09b64db/Turkey-Wales-June-16-2021-UEFA-Euro" url4 <- "https://fbref.com/en/matches/d9eaa85c/Wales-Switzerland-June-12-2021-UEFA-Euro" url5 <- "https://fbref.com/en/matches/b756c626/Italy-Wales-June-20-2021-UEFA-Euro" url6 <- "https://fbref.com/en/matches/fa85a731/Switzerland-Turkey-June-20-2021-UEFA-Euro" url_group_A <- rbind(url1, url2, url3, url4, url5, url6) # Group B url7 <- "https://fbref.com/en/matches/e594174b/Belgium-Russia-June-12-2021-UEFA-Euro" url8 <- "https://fbref.com/en/matches/25bb1fa2/Denmark-Belgium-June-17-2021-UEFA-Euro" url9 <- "https://fbref.com/en/matches/2c48acb2/Finland-Russia-June-16-2021-UEFA-Euro" url10 <- "https://fbref.com/en/matches/c3c2ffa2/Denmark-Finland-June-12-2021-UEFA-Euro" url11 <- "https://fbref.com/en/matches/bd35edec/Finland-Belgium-June-21-2021-UEFA-Euro" url12 <- "https://fbref.com/en/matches/04188c5c/Russia-Denmark-June-21-2021-UEFA-Euro" url_group_B <- rbind(url7, url8, url9, url10, url11, url12) # Group C url13 <- "https://fbref.com/en/matches/f3d39a29/Netherlands-Austria-June-17-2021-UEFA-Euro" url14 <- "https://fbref.com/en/matches/b47a0ea6/Austria-North-Macedonia-June-13-2021-UEFA-Euro" url15 <- "https://fbref.com/en/matches/e0eed6e8/Ukraine-North-Macedonia-June-17-2021-UEFA-Euro" url16 <- "https://fbref.com/en/matches/0e9919a5/Netherlands-Ukraine-June-13-2021-UEFA-Euro" url17 <- "https://fbref.com/en/matches/841065f5/North-Macedonia-Netherlands-June-21-2021-UEFA-Euro" url18 <- "https://fbref.com/en/matches/7ed46abd/Ukraine-Austria-June-21-2021-UEFA-Euro" url_group_C <- rbind(url13, url14, url15, url16, url17, url18) # Group D url19 <- "https://fbref.com/en/matches/6599f4ab/Scotland-Czech-Republic-June-14-2021-UEFA-Euro" url20 <- "https://fbref.com/en/matches/1e930db9/Croatia-Czech-Republic-June-18-2021-UEFA-Euro" url21 <- "https://fbref.com/en/matches/764c27dc/England-Croatia-June-13-2021-UEFA-Euro" url22 <- "https://fbref.com/en/matches/027b11df/England-Scotland-June-18-2021-UEFA-Euro" url23 <- "https://fbref.com/en/matches/20b1972b/Czech-Republic-England-June-22-2021-UEFA-Euro" url24 <- "https://fbref.com/en/matches/0305e42c/Croatia-Scotland-June-22-2021-UEFA-Euro" url_group_D <- rbind(url19, url20, url21, url22, url23, url24) # Group E url25 <- "https://fbref.com/en/matches/107fd412/Spain-Sweden-June-14-2021-UEFA-Euro" url26 <- "https://fbref.com/en/matches/d35ad7a8/Poland-Slovakia-June-14-2021-UEFA-Euro" url27 <- "https://fbref.com/en/matches/c6533f76/Sweden-Slovakia-June-18-2021-UEFA-Euro" url28 <- "https://fbref.com/en/matches/14874531/Spain-Poland-June-19-2021-UEFA-Euro" url29 <- "https://fbref.com/en/matches/ee6087f4/Sweden-Poland-June-23-2021-UEFA-Euro" url30 <- "https://fbref.com/en/matches/7b46b857/Slovakia-Spain-June-23-2021-UEFA-Euro" url_group_E <- rbind(url25, url26, url27, url28, url29, url30) # Group F url31 <- "https://fbref.com/en/matches/95d34c87/France-Germany-June-15-2021-UEFA-Euro" url32 <- "https://fbref.com/en/matches/ba500d70/Hungary-Portugal-June-15-2021-UEFA-Euro" url33 <- "https://fbref.com/en/matches/988198ba/Hungary-France-June-19-2021-UEFA-Euro" url34 <- "https://fbref.com/en/matches/e33c4403/Portugal-Germany-June-19-2021-UEFA-Euro" url35 <- "https://fbref.com/en/matches/5a7e53d8/Portugal-France-June-23-2021-UEFA-Euro" url36 <- "https://fbref.com/en/matches/a4888546/Germany-Hungary-June-23-2021-UEFA-Euro" url_group_F <- rbind(url31, url32, url33, url34, url35, url36) ##### # Step 5: Read a single pair of tables for a single game ##### # Choose a game from the list of URLs from the previous step selected_game <- url35 # Some data manipulation to get the date and teams from the URLs game_data <- substr(selected_game, 39, nchar(selected_game)-10) date <- substr(game_data, nchar(game_data)-11, nchar(game_data)) teams <- substr(game_data, 1, nchar(game_data)-13) teams <- str_replace(teams, "Czech-Republic", "Czech Republic") teams <- str_replace(teams, "North-Macedonia", "North Macedonia") teamA <- sub("-.*", "", teams) teamB <- sub(".*-", "", teams) #define the node node <- "#stats_b561dd30_defense" #add the node to the URL url <- paste0(selected_game, node) #read first table and add the date and teams statA <- htmltab(doc = url, which = 4, rm_nodata_cols = F) statA <- cbind(date, Team=teamA, Opponent=teamB, statA) #read second table and add the date and teams statB <- htmltab(doc = url, which = 11, rm_nodata_cols = F) statB <- cbind(date, Team=teamB, Opponent=teamA, statB) #combine the two table rows stat_both <- rbind(statA, statB) stat_both$Player <- str_trim(stat_both$Player, side = c("both", "left", "right")) View(stat_both) ##### # Step 6: Read all tables for all games ##### #combine all game URLs for all groups selected_urls <- rbind(url_group_A, url_group_B, url_group_C, url_group_D, url_group_E, url_group_F) #initialize tables all_stat <- NULL full_stat <- NULL for (g in 1:length(selected_urls)){ # Get the game info from the URL game_data <- substr(selected_urls[g], 39, nchar(selected_urls[g])-10) date <- substr(game_data, nchar(game_data)-11, nchar(game_data)) teams <- substr(game_data, 1, nchar(game_data)-13) teams <- str_replace(teams, "Czech-Republic", "Czech Republic") teams <- str_replace(teams, "North-Macedonia", "North Macedonia") teamA <- sub("-.*", "", teams) teamB <- sub(".*-", "", teams) #read the first pair of tables node <- "#stats_b561dd30_defense" url <- paste0(selected_urls[g], node) statA <- htmltab(doc = url, which = 4, rm_nodata_cols = F) statA <- cbind(date, Team=teamA, Opponent=teamB, statA) statB <- htmltab(doc = url, which = 11, rm_nodata_cols = F) statB <- cbind(date, Team=teamB, Opponent=teamA, statB) stat_both <- rbind(statA, statB) all_stat <- stat_both #define the game's data frame all_stat <- stat_both #loop for all tables related to the game for(i in 5:9){ game_data <- substr(selected_urls[g], 39, nchar(selected_urls[g])-10) date <- substr(game_data, nchar(game_data)-11, nchar(game_data)) teams <- substr(game_data, 1, nchar(game_data)-13) teams <- str_replace(teams, "Czech-Republic", "Czech Republic") teams <- str_replace(teams, "North-Macedonia", "North Macedonia") teamA <- sub("-.*", "", teams) teamB <- sub(".*-", "", teams) node <- "#stats_b561dd30_defense" url <- paste0(selected_urls[g],node) statA <- htmltab(doc = url, which = i, rm_nodata_cols = F) statA <- cbind(date, Team=teamA, Opponent=teamB, statA) statB <- htmltab(doc = url, which = i+7, rm_nodata_cols = F) statB <- cbind(date, Team=teamB, Opponent=teamA, statB) stat_both <- rbind(statA, statB) all_stat <- merge(all_stat, stat_both, by="Player") } #add the game tables to the total data frame full_stat <- rbind(full_stat, all_stat) } #remove any duplicates all_stat_full <- unique(full_stat) #remove any leading or trailing whitespaces all_stat_full$Player <- str_trim(all_stat_full$Player, side = c("both", "left", "right")) #convert all stats into numeric variables all_stat_full <- cbind(all_stat_full[,1:7], mutate_all(all_stat_full[,8:ncol(all_stat_full)], function(x) as.numeric(as.character(x)))) #export the table to CSV write.csv(all_stat_full,"all_stat_full.csv") ##### # Step 7: Create summary data frame - pivot table ##### # remove some unwanted columns all_stat_full$Pos.x <- NULL all_stat_full$Age.x <- NULL all_stat_full$`#.x` <- NULL all_stat_full$date.x <- NULL all_stat_full$`Team.x` <- NULL all_stat_full$Opponent.x <- NULL all_stat_full$Pos.x <- NULL all_stat_full$Age.x <- NULL all_stat_full$`#.x` <- NULL all_stat_full$`Team.x` <- NULL all_stat_full$Opponent.x <- NULL #Sum all stats for each player all_stat_full <- all_stat_full %>% group_by(Player) %>% summarise_each(list(sum)) View(all_stat_full) ##### #Step 8: Select players ##### #Select the players you want to see. Choose 8 players for better visual results. selected_players <- subset(all_stat_full, Player=="Kylian Mbappé" | Player=="Antoine Griezmann" | Player=="Harry Kane" | Player=="Kai Havertz" | Player=="Cristiano Ronaldo" | Player=="Álvaro Morata" | Player=="Memphis Depay" | Player=="Patrik Schick") ##### # Step 9: Create the radar plots ##### #attach the dataset attach(selected_players) #select the statistics we want to see and prepare for the plot Sel <- data.frame("xG"=`Expected >> xG`, "Dr"=`Dribbles >> Succ.x`, "Pass"=`Passes >> Cmp`, "Sh"=`Performance >> Sh`, "SoT"=`Performance >> SoT`, "KP"=`KP`) Sel <- mutate_all(Sel, function(x) as.numeric(as.character(x))) #run the radialprofile function with std=T, which standardizes the data so that the scale looks normal p <- radialprofile(data=Sel, title=selected_players$Player, std=T) detach(selected_players) ##### # Step 10: Make the graph presentable ##### g <- grid.arrange(grobs=p[1:length(p)], ncol=3) g2 <- cowplot::ggdraw(g)+theme_grey()+ labs(title="Selected Players Radar Plots", subtitle="Data from fbref.com. Aggregated data from EURO 2020 Group Stage Matches.\nStat values are standardized (μ=0, sd=1).", caption = "@Sweep_SportsAnalytics") g2 ggsave("radar-plot.png", w = 9, h = 9, dpi = 400) #create a table with descriptions for the stats we chose descriptions <- data.frame( "Category"=colnames(Sel), "Description"=c("Expected Goals", "Successful Dribbles", "Completed Passes", "Shots", "Shots on Target", "Key Passes")) descr <- tableGrob(print(descriptions, row.names = F)) #add the description table g_final <- g2 + annotation_custom(descr, xmin = 0.8, xmax = 0.9, ymin = 0.1, ymax = 0.2) + coord_cartesian(clip = "off") g_final ggsave("radar-key-final.png", w = 9, h = 9, dpi = 400)
The post Soccer Analytics for Beginners: An R Tutorial on EURO 2020 Data – Web Scraping & Radar Plots appeared first on Sweep Sports Analytics.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.