Site icon R-bloggers

Visualizing the Asian Cup with R!

[This article was first published on R by R(yo), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Another year, another big soccer/football tournament! This time it’s the top international competition in Asia, the Asian Cup hosted in the U.A.E. In this blog post I’ll be covering (responsible) web-scraping, data wrangling (tidyverse FTW!), and of course, data visualization with ggplot2.

Let’s get started!

Packages

pacman::p_load(tidyverse, scales, lubridate, ggrepel, stringi, magick, 
               glue, extra, rvest, ggtextures, cowplot, ggimage, polite)
# Roboto Condensed  (from hrbrmstrthemes)
loads()

Top Goalscorers of the Asian Cup

The first thing I looked at was, “Who are the top goalscorers in the history of the Asian Cup?”

Here I use the polite package to take a look at the robots.txt for the web page and see if it is OK to web scrape from it. First you pass the URL to the bow() function, check that you are indeed allowed to scrape, then use scrape() to retrieve data, and the rest is the usual rvest web-scraping workflow.

topg_url <- "https://en.wikipedia.org/wiki/AFC_Asian_Cup_records_and_statistics"

session <- bow(topg_url)

ac_top_scorers <- scrape(session) %>%
  html_nodes("table.wikitable:nth-child(29)") %>% 
  html_table() %>% 
  flatten_df() %>% 
  select(-Ref.) %>% 
  set_names(c("total_goals", "player", "country"))

For brevity, let’s only take a look at the top 5 goal scorers. I’ll also mutate() in a nice image of a soccer ball for the data points on the plot.

ac_top_scorers <- ac_top_scorers %>% 
  head(5) %>% 
  mutate(image = "https://www.emoji.co.uk/files/microsoft-emojis/activity-windows10/8356-soccer-ball.png")

I made something slightly different to your standard bar graph as I use the geom_isotype_col() function from ggtextures to create a bar of soccer ball images. Compared to other functions in ggtextures, geom_isotype_col() allows each image to correspond to the value of the variable you are plotting, in this case 1 ball = 1 goal!

ac_top_graph <- ac_top_scorers %>% 
  ggplot(aes(x = reorder(player, total_goals), y = total_goals,
             image = image)) +
  geom_isotype_col(img_width = grid::unit(1, "native"), img_height = NULL,
    ncol = NA, nrow = 1, hjust = 0, vjust = 0.5) +
  coord_flip() +
  scale_y_continuous(breaks = c(0, 2, 4, 6, 8, 10, 12, 14),
                     expand = c(0, 0), 
                     limits = c(0, 15)) +
  ggthemes::theme_solarized() +
  labs(title = "Top Scorers of the Asian Cup",
       subtitle = "Most goals in a single tournament: 8 (Ali Daei, 1996)",
       y = "Number of Goals", x = NULL,
       caption = glue("
                      Source: Wikipedia
                      By @R_by_Ryo")) +
  theme(text = element_text(family = "Roboto Condensed"),
        plot.title = element_text(size = 22),
        plot.subtitle = element_text(size = 14),
        axis.text = element_text(size = 14),
        axis.title.x = element_text(size = 16),
        axis.line.y = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.ticks.y = element_blank())

ac_top_graph

OK, not bad. However, wouldn’t it be nice to add a bit more context? Specifically, which country these players came from. So let’s add some flags along the y-axis!

There are lots of different ways to do this (like geom_flag() from the ggimage package) but I ended up doing it the cowplot way. I had to tweak the scales a bit as the flags came in different sizes. When you plot, you just insert the image strip into the bar plot with axis_canvas() and combine all the parts together with ggdraw()!

axis_image <- axis_canvas(ac_top_graph, axis = 'y') + 
  draw_image("https://upload.wikimedia.org/wikipedia/commons/c/ca/Flag_of_Iran.svg", 
             y = 13, scale = 1.5) +
  draw_image("https://upload.wikimedia.org/wikipedia/commons/0/09/Flag_of_South_Korea.svg", 
             y = 10, scale = 1.7) +
  draw_image("https://upload.wikimedia.org/wikipedia/en/9/9e/Flag_of_Japan.svg", 
             y = 7, scale = 1.7) +
  draw_image("https://upload.wikimedia.org/wikipedia/commons/f/f6/Flag_of_Iraq.svg", 
             y = 4, scale = 1.6) +
  draw_image("https://upload.wikimedia.org/wikipedia/commons/a/aa/Flag_of_Kuwait.svg", 
             y = 1, scale = 1.2)

ggdraw(insert_yaxis_grob(ac_top_graph, axis_image, position = "left"))

Ideally I wanted the soccer balls to be the official balls from the tournament that the player scored in. However, I couldn’t find a nice emoji-fied/icon-ized version and there was also the “small” problem in that there was no “official” Asian Cup ball until the 2004 tournament in China! You can take a look at the official Asian Cup balls here.

Winners of the Asian Cup

We saw that the top goal scorers came from Iran, South Korea, Japan, Iraq, and Kuwait but did their goal scoring exploits lead their nations to glory? Let’s find out!

When web-scraping I really like using flatten_df() after html_table() as I don’t have to use the awkward looking .[[1]] within my piped workflow.

acup_url <- "https://en.wikipedia.org/wiki/AFC_Asian_Cup"

session <- bow(acup_url)

acup_winners_raw <- scrape(session) %>% 
  html_nodes("table:nth-child(31)") %>% 
  html_table() %>% 
  flatten_df()

Now I can use the clean_names() function to quickly clean up my column names (mainly when I can’t be bothered to set_names() them myself…).

The next steps are splitting up the number of times a team placed between 1st and 3rd and the year that occurred with separate(). Then variants of mutate() are used to tidy the string columns of the data into numeric type. I use gather() so each team will have a row for each of the rank positions (1st-3rd). Finally, I arrange the data in a way that the facets will be ordered in the way that I want.

acup_winners_clean <- acup_winners_raw %>% 
  janitor::clean_names() %>% 
  slice(1:8) %>% 
  select(-fourth_place, -semi_finalists, -total_top_four) %>% 
  separate(winners, into = c("Champions", "first_place_year"), 
           sep = " ", extra = "merge") %>% 
  separate(runners_up, into = c("Runners-up", "second_place_year"), 
           sep = " ", extra = "merge") %>% 
  separate(third_place, into = c("Third Place", "third_place_year"), 
           sep = " ", extra = "merge") %>% 
  mutate_all(funs(str_replace_all(., "–", "0"))) %>% 
  mutate_at(vars(contains("num")), funs(as.numeric)) %>% 
  mutate(team = if_else(team == "Israel1", "Israel", team)) %>% 
  gather(key = "key", value = "value", -team, 
         -first_place_year, -second_place_year, -third_place_year) %>% 
  mutate(key = key %>% 
           fct_relevel(c("Champions", "Runners-up", "Third Place"))) %>% 
  arrange(key, value) %>% 
  mutate(team = as_factor(team),
         order = row_number())

I plot using facets on the “key” variable (containing the rank data) so that we can see how many times each team placed as Champions to Third Place. I also use the glue() function here to format the multi-line captions and titles in a neat way.

acup_winners_clean %>% 
  ggplot(aes(value, team, color = key)) +
  geom_point(size = 5) +
  scale_color_manual(values = c("Champions" = "#FFCC33",
                                "Runners-up" = "#999999",
                                "Third Place" = "#CC6600"),
                     guide = FALSE) +
  labs(x = "Number of Occurrence",
       title = "Winners & Losers of the Asian Cup!",
       subtitle = glue("
                       Ordered by number of Asian Cup(s) won.
                       Four-time Champions, Japan, only won their first in 1992!"),
       caption = glue("
                      Note: Israel was expelled by the AFC in 1974 while Australia joined the AFC in 2006.
                      Source: Wikipedia
                      By @R_by_Ryo")) +
  facet_wrap(~key) +
  theme_minimal() +
  theme(text = element_text(family = "Roboto Condensed"),
        title = element_text(size = 18),
        plot.subtitle = element_text(size = 12),
        axis.title.y = element_blank(),
        axis.title.x = element_text(size = 12),
        axis.text.y = element_text(size = 14),
        axis.text.x = element_text(size = 12),
        plot.caption = element_text(hjust = 0, size = 10),
        panel.border = element_rect(fill = NA, colour = "grey20"),
        panel.grid.minor.x = element_blank(),
        strip.text = element_text(size = 16)) 

Goals per Game

One new thing I learned very recently, while working on this viz in fact, was using magrittr aliases! In this workflow I always wind up having to use .[x] or .[[x]] but now I can just use extract() or extract2() respectively to do the same thing!

wiki_url <- "https://en.wikipedia.org"
session <- bow(wiki_url)
acup_url <- "https://en.wikipedia.org/wiki/AFC_Asian_Cup"
session_cup <- bow(acup_url)

cup_links <- scrape(session_cup) %>% 
  html_nodes("br+ i a") %>% 
  html_attr("href") %>% 
  magrittr::extract(-17:-18)

acup_df <- cup_links %>% 
  as_data_frame() %>% 
  mutate(cup = str_remove(value, "\\/wiki\\/") %>% str_replace_all("_", " ")) %>% 
  rename(link = value)

Another cool thing I found while scraping this data was the jump_to() function that allows you to navigate to a new URL. This makes map()-ing over multiple URL links from a base URL very easy! Here, the base URL is the AFC Asian Cup Wikipedia page and the function iterates over each of the URL links of the respective tournament pages. Another way that I could’ve done this was to map() over the different dates of the tournaments as the Wikipedia page of each edition of the Asian Cup only differed in the “year” appended at the beginning of the URL.

goals_info <- function(x) {
  goal_info <- scrape(session) %>% 
    jump_to(x) %>% 
    html_nodes(".vcalendar") %>% 
    html_table(header = FALSE) %>% 
    flatten_df() %>% 
    spread(key = X1, value = X2) %>% 
    select(`Goals scored`) %>% 
    mutate(`Goals scored` = str_remove_all(`Goals scored`, pattern = ".*\\(") %>% 
             str_extract_all("\\d+\\.*\\d*") %>% as.numeric)
}

team_num_info <- function(x) {
  team_num_info <- scrape(session) %>% 
    jump_to(x) %>% 
    html_nodes(".vcalendar") %>% 
    html_table(header = FALSE) %>% 
    flatten_df() %>% 
    spread(key = X1, value = X2) %>% 
    select(`Teams`) %>% 
    mutate(`Teams` = as.numeric(`Teams`))
}

match_num_info <- function(x) {
  match_num_info <- scrape(session) %>% 
    jump_to(x) %>% 
    html_nodes(".vcalendar") %>% 
    html_table(header = FALSE) %>% 
    flatten_df() %>% 
    spread(key = X1, value = X2) %>% 
    janitor::clean_names() %>% 
    select(matches_played) %>% 
    mutate(matches_played = as.numeric(matches_played))
}

# all together:
goals_data <- acup_df %>% 
  mutate(goals_per_game = map(acup_df$link, goals_info) %>% unlist,
         team_num = map(acup_df$link, team_num_info) %>% unlist,
         match_num = map(acup_df$link, match_num_info) %>% unlist)

Next, I clean it up a bit and add in the number of teams that participated in each tournament.

ac_goals_df <- goals_data %>% 
  mutate(label = cup %>% str_extract("[0-9]+") %>% str_replace("..", "'"),
         team_num = case_when(
           is.na(team_num) ~ 16,
           TRUE ~ team_num
         )) %>% 
  arrange(cup) %>% 
  mutate(label = factor(label, label),
         team_num = c(4, 4, 4, 5, 6, 6, 10, 10, 10, 8, 12, 12, 16, 16, 16, 16))

glimpse(ac_goals_df)

## Observations: 16
## Variables: 6
## $ link           <chr> "/wiki/1956_AFC_Asian_Cup", "/wiki/1960_AFC_Asi...
## $ cup            <chr> "1956 AFC Asian Cup", "1960 AFC Asian Cup", "19...
## $ goals_per_game <dbl> 4.50, 3.17, 2.17, 3.20, 2.92, 2.50, 3.17, 1.83,...
## $ team_num       <dbl> 4, 4, 4, 5, 6, 6, 10, 10, 10, 8, 12, 12, 16, 16...
## $ match_num      <dbl> 6, 6, 6, 10, 13, 10, 24, 24, 24, 16, 26, 26, 32...
## $ label          <fct> '56, '60, '64, '68, '72, '76, '80, '84, '88, '9...

Now we make a line graph but with lots of annotate() code to add in comments, labels, and segments for the labels. At the end I use geom_emoji() to add a soccer ball to the plot for each of the data points.

plot <- ac_goals_df %>% 
  ggplot(aes(x = label, y = goals_per_game, group = 1)) +
  geom_line() +
  scale_y_continuous(limits = c(NA, 5.35),
                     breaks = c(1.5, 2, 2.5, 3, 3.5, 4, 4.5)) +
  labs(x = "Tournament (Year)", y = "Goals per Game") +
  theme_minimal() +
  theme(text = element_text(family = "Roboto Condensed"),
        axis.title = element_text(size = 12),
        axis.text = element_text(size = 12)) +
  annotate(geom = "label", x = "'56", y = 5.15, family = "Roboto Condensed",
           color = "black", 
           label = "Total Number of Games Played:", hjust = 0) +
  annotate(geom = "text", x = "'60", y = 4.9, 
           label = "6", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 1, xend = 3, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = "'68", y = 4.9, 
           label = "10", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 3.8, xend = 4.2, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = "'72", y = 4.9, 
           label = "13", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 4.8, xend = 5.2, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = "'76", y = 4.9, 
           label = "10", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 5.8, xend = 6.2, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = "'84", y = 4.9, 
           label = "24", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 7, xend = 9, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = "'92", y = 4.9, 
           label = "16", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 9.8, xend = 10.2, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = 11.5, y = 4.9, 
           label = "26", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 11, xend = 12, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = 14.5, y = 4.9, 
           label = "32", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 13, xend = 16, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = 9, y = 4, family = "Roboto Condensed",
           label = glue("
                        Incredibly low amount of goals in Group B
                        (15 in 10 Games) and in Knock-Out Stages
                        (4 goals in 4, only 1 scored in normal time)")) +
  annotate(geom = "segment", x = 9, xend = 9, y = 1.65, yend = 3.75,
           color = "red") +
  ggimage::geom_emoji(aes(image = '26bd'), size = 0.03) 

plot

ggsave(filename = glue("{here::here('Asian Cup 2019')}/gpg_plot_final.png"), 
       width = 8, height = 7, dpi = 300)
plot <- image_read(glue("{here::here('Asian Cup 2019')}/gpg_plot_final.png"))

However, I’m not finished yet! I wanted to try to make this look a bit more “official” so I attempted to add the Asian Cup logo on the top right corner. There are probably alternative ways to how I did it below, especially by using grobs, but I was reminded of this blog post by Daniel Hadley who used the magick package to add a footer with a logo onto a ggplot object. I’ve used magick before for animations and this was a good chance to try it out for image editing. Compared to Daniel Hadley’s example I needed to have the logo on the right corner so I had to create a blank canvas with image_blank() and then placing everything on top of that with image_composite() and image_append().

logo_raw <- image_read("https://upload.wikimedia.org/wikipedia/en/a/ad/2019_afc_asian_cup_logo.png")

logo_proc <- logo_raw %>% image_scale("600")

# create blank canvas
a <- image_blank(width = 1000, height = 100, color = "white")
# combine with logo image and shift logo to the right
b <- image_composite(image_scale(a, "x100"), image_scale(logo_proc, "x75"), 
                     offset = "+880+25")
# add in the title text
logo_header <- b %>% 
  image_annotate(text = glue("Goals per Game Throughout the History of the Asian Cup"),
                 color = "black", size = 24,  = "Roboto Condensed",
                 location = "+63+50", gravity = "northwest")

# combine it all together! 
final2_plot <- image_append(image_scale(c(logo_header, plot), "1000"), stack = TRUE)

# image_write(final2_plot,
#             glue("{here::here('Asian Cup 2019')}/gpg_plot_final.png"))

final2_plot

All in all it took a while to tweak the positions of the text and logo image but for my first try it worked well. There is definitely room for improvement in regards to sizing and scaling though.

Ultimately, I couldn’t find much information on why those tournaments in the 80s in particular were such low scoring affairs. I wasn’t alive to watch those games on TV nor could I find any illuminating articles or blog posts on the style of Asian football back then… This was also before Japan really got into soccer so there wasn’t anything I could find in Japanese either.

Japan’s Record vs. Historical Rivals and Group D Opponents

Japan is the most successful team in the competition with 4 championships but who are their opponents in the group stages and how have they fared against them in the past? While I’m at it I will also check Japan’s records against long-time continental rivals such as Iran, South Korea, Saudi Arabia and more recently, Australia.

The data I’m going to use comes from Kaggle which has all international football results from 1872 to the World Cup final last year. To add in the federation affiliation (UEFA, AFC, etc.) for each of the countries I slightly modified some code from one of the kernels, “A Journey Through The History of Soccer” by PH Julien.

federation_files <- Sys.glob("../data/federation_affiliations/*")

df_federations = data.frame(country = NULL, federation = NULL)
for (f in federation_files) {
    federation = basename(f)
    content = read.csv(f, header=FALSE)
    content <- cbind(content,federation=rep(federation, dim(content)[1]))
    df_federations <- rbind(df_federations, content)
}

colnames(df_federations) <- c("country", "federation")

df_federations <- df_federations %>% 
  mutate(country = as.character(country) %>% str_trim(side = "both"))

Now to load the results data and then join it with the affiliations data.

results_raw <- read_csv("../data/results.csv")

results_japan_raw <- results_raw %>% 
  filter(home_team == "Japan" | away_team == "Japan") %>% 
  rename(venue_country = country, 
         venue_city = city) %>% 
  mutate(match_num = row_number())

# combine with federation affiliations
results_japan_home <- results_japan_raw %>% 
  left_join(df_federations, 
            by = c("home_team" = "country")) %>% 
  mutate(federation = as.character(federation)) %>% 
  rename(home_federation = federation) 

results_japan_away <- results_japan_raw %>% 
  left_join(df_federations, 
            by = c("away_team" = "country")) %>% 
  mutate(federation = as.character(federation)) %>% 
  rename(away_federation = federation)

# combine home-away
results_japan_cleaned <- results_japan_home %>% 
  full_join(results_japan_away)

Next I need to edit some of the continents for teams that didn’t have a match in the federation affiliation data set, for example, “South Korea” is “Korea Republic” in the Kaggle data set.

results_japan_cleaned <- results_japan_cleaned %>% 
  mutate(
    home_federation = case_when(
      home_team %in% c(
        "China", "Manchukuo", "Burma", "Korea Republic", "Vietnam Republic",
        "Korea DPR", "Brunei") ~ "AFC",
      home_team == "USA" ~ "Concacaf",
      home_team == "Bosnia-Herzegovina" ~ "UEFA",
      TRUE ~ home_federation),
    away_federation = case_when(
      away_team %in% c(
        "China", "Manchukuo", "Burma", "Korea Republic", "Vietnam Republic",
        "Korea DPR", "Brunei", "Taiwan") ~ "AFC",
      away_team == "USA" ~ "Concacaf",
      away_team == "Bosnia-Herzegovina" ~ "UEFA",
      TRUE ~ away_federation
    ))

Now that it’s nice and cleaned up I can reshape it so that the data is set from Japan’s perspective.

results_jp_asia <- results_japan_cleaned %>% 
  # filter only for Japan games and AFC opponents
  filter(home_team == "Japan" | away_team == "Japan",
         home_federation == "AFC" & away_federation == "AFC") %>% 
  select(-contains("federation"), -contains("venue"),
         -neutral, -match_num,
         date, home_team, home_score, away_team, away_score, tournament) %>% 
  # reshape columns to Japan vs. opponent
  mutate(
    opponent = case_when(
      away_team != "Japan" ~ away_team,
      home_team != "Japan" ~ home_team),
    home_away = case_when(
      home_team == "Japan" ~ "home",
      away_team == "Japan" ~ "away"),
    japan_goals = case_when(
      home_team == "Japan" ~ home_score,
      away_team == "Japan" ~ away_score),
    opp_goals = case_when(
      home_team != "Japan" ~ home_score,
      away_team != "Japan" ~ away_score)) %>% 
  # label results from Japan's perspective
  mutate(
    result = case_when(
      japan_goals > opp_goals ~ "Win",
      japan_goals < opp_goals ~ "Loss",
      japan_goals == opp_goals ~ "Draw"),
    result = result %>% as_factor() %>% fct_relevel(c("Win", "Draw", "Loss"))) %>% 
  select(-contains("score"), -contains("team"))

With all that done we can take a look at how Japan have done against certain opponents by using filter().

results_jp_asia %>% 
  filter(opponent == "Jordan",
         tournament == "AFC Asian Cup")

## # A tibble: 3 x 7
##   date       tournament    opponent home_away japan_goals opp_goals result
##   <date>     <chr>         <chr>    <chr>           <int>     <int> <fct> 
## 1 2004-07-31 AFC Asian Cup Jordan   home                1         1 Draw  
## 2 2011-01-09 AFC Asian Cup Jordan   home                1         1 Draw  
## 3 2015-01-20 AFC Asian Cup Jordan   home                2         0 Win

Unfortunately, this data set doesn’t go into extra-time or penalty wins as Japan’s Quarter-Final meeting with Jordan in 2004 ended with Japan securing a route to the semis, 4-3 on penalties!

I can create a function that’ll filter for certain opponents and tournaments and aggregate the results. With the second argument being ..., tidyeval allows me to input any kind of filter condition for an opponent, tournament, etc. The if else statement protects against cases where Japan never had that type of result against an opponent and makes sure that a column populated by 0s is created.

japan_versus <- function(data, ...) {
  # filter 
  filter_vars <- enquos(...)
  
  jp_vs <- data %>% 
    filter(!!!filter_vars) %>% 
    # count results type per opponent
    group_by(result, opponent) %>% 
    mutate(n = n()) %>% 
    ungroup() %>% 
    # sum amount of goals by Japan and opponent
    group_by(result, opponent) %>% 
    summarize(j_g = sum(japan_goals),
              o_g = sum(opp_goals),
              n = n()) %>% 
    ungroup() %>% 
    # spread results over multiple columns
    spread(result, n) %>% 
    # 1. failsafe against no type of result against an opponent
    # 2. sum up counts per opponent
    group_by(opponent) %>% 
    mutate(Win = if("Win" %in% names(.)){return(Win)} else{return(0)},
         Draw = if("Draw" %in% names(.)){return(Draw)} else{return(0)},
         Loss = if("Loss" %in% names(.)){return(Loss)} else{return(0)}) %>% 
    summarize(Win = sum(Win, na.rm = TRUE),
              Draw = sum(Draw, na.rm = TRUE),
              Loss = sum(Loss, na.rm = TRUE),
              `Goals For` = sum(j_g),
              `Goals Against` = sum(o_g))
  
  return(jp_vs)
}

Now let’s try it out a bit.

japan_versus(data = results_jp_asia, 
             opponent == "China")

## # A tibble: 1 x 6
##   opponent   Win  Draw  Loss `Goals For` `Goals Against`
##   <chr>    <int> <int> <int>       <int>           <int>
## 1 China       14     8    10          54              45

I can put in multiple filter conditions if needed as well.

japan_versus(data = results_jp_asia,
             home_away == "home",
             opponent %in% c("Palestine", "Vietnam", "India"))

## # A tibble: 3 x 6
##   opponent    Win  Draw  Loss `Goals For` `Goals Against`
##   <chr>     <int> <dbl> <dbl>       <int>           <int>
## 1 India         2     0     0          13               0
## 2 Palestine     1     0     0           4               0
## 3 Vietnam       1     0     0           1               0

As you can see Japan has never lost or drawn against India, Palestine, or Vietnam so in the data there wouldn’t have been any rows with “Loss” in the results column. With the function I created I was able to impute results that didn’t exist and fill them in with 0s!

Let’s check Japan’s performance against our main rivals in the Asian Cup. Here I make the tables look a lot nicer with the options in the kable and kableExtra packages.

results_jp_asia %>% 
  japan_versus(opponent %in% c("Iran", "Korea Republic", "Saudi Arabia"),
               tournament == "AFC Asian Cup") %>% 
  knitr::kable(format = "html",
               caption = "Japan vs. Historic Rivals in the Asian Cup") %>% 
  kableExtra::kable_styling(full_width = FALSE) %>% 
  kableExtra::add_header_above(c(" ", "Result" = 3, "Goals" = 2))
Japan vs. Historic Rivals in the Asian Cup
Result Goals
opponent Win Draw Loss Goals For Goals Against
Iran 1 2 0 1 0
Korea Republic 0 2 1 2 4
Saudi Arabia 4 0 1 13 4

Now let’s take a look at how Japan have historically played against the other teams in Group F of this year’s Asian Cup (in all competitions).

results_jp_asia %>% 
  japan_versus(opponent %in% c("Oman", "Uzbekistan", "Turkmenistan")) %>% 
  knitr::kable(format = "html",
               caption = "Japan's Record vs. Group F Teams") %>% 
  kableExtra::kable_styling(full_width = FALSE) %>% 
  kableExtra::add_header_above(c(" ", "Result" = 3, "Goals" = 2))
Japan’s Record vs. Group F Teams
Result Goals
opponent Win Draw Loss Goals For Goals Against
Oman 8 3 0 19 4
Uzbekistan 6 3 1 28 9

We see no rows here for Turkmenistan. This is due to the fact that until just this past week Japan had never played against them in a friendly or competitive game!

Conclusion

In this blog post I went through a few examples of visualizing some very basic stats on the Asian Cup happening this month. I’ll devote this last section on my views on this edition of the Asian Cup and Japan’s national team.

Although Japan’s first game was quite horrible I’m hoping it’ll wake the players and coaches out of their complacency and not underestimate our opponents in the next two games. Thankfully, South Korea should be on the other side of the bracket for the knock-out stages and we would also only meet Iran in the semifinals (provided both teams finish top of their respective groups). Japan could meet Australia in the Quarters but without Aaron Mooy they’re a much weaker side as shown in their abject loss to Jordan in their opening match.

Even with losing our new star, Shoya Nakajima, to injury the fact that we can replace him with a player of the calibre of Takashi Inui and with Hannover regular, Genki Haraguchi, stepping up from the bench shows how much Japanese football has progressed these past 25 years.

It’s a changing of the guard for Japan after the retirement of captain Hasebe and Keisuke Honda but with more Japanese players headed to Europe from a young age these are exciting times to be a Japanese football fan. It’s been quite awe-inspiring seeing how the number of Japanese players playing for foreign clubs have been steadily increasing since the 1988 Asian Cup squad (Japan’s first appearance at a major tournament, minus the Olympics).

This tournament is the first hurdle for this new generation of players as they fight to become regulars for the national team and begin the journey to the next World Cup in 2022. Here’s hoping for another great month of football!

(Image Source: Nikkan Sports)

To leave a comment for the author, please follow the link and comment on their blog: R by R(yo).

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.