Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
If you sit in the intersection of “likes Australian Rules football / finds sport statistics interesting / is on Twitter”, you’ve probably come across Swamp. One of his recent tweets tells us that:
The answer to that question is at once surprising, less surprising when you think about it, and quite easy to figure out using the ever-helpful fitzRoy package.
Getting the data
Several options would work: I used the fitzRoy
function get_fryzigg_stats()
which although deprecated, does what I want (gets all games from 1897 onwards in one shot), and returns nicer variable names than some of the other functions.
library(tidyverse) library(fitzRoy) library(lubridate) # get data afldata <- fitzRoy::get_fryzigg_stats()
Most games played by the same lineup
Players in a team are identified by a numerical player_id
. So we can represent the team lineup (here called squad) by sorting the IDs and pasting them into a character string. Then, simply counting the strings and filtering for n > 1 will return teams where the same set of players played together more than once. Note that we need to sort the numerical IDs, or we could get the same players in more than one game, but in different orders.
lineup_multiple_games <- afldata %>% group_by(match_id, player_team) %>% summarise(squad = paste(sort(player_id), collapse = ";")) %>% ungroup() %>% count(player_team, squad, sort = TRUE, name = "n_games") %>% filter(n_games > 1) %>% mutate(n_players = str_count(squad, ";") + 1)
We get to the final dataset by doing something quite similar except this time, grouping on more variables before generating the lineup. We can then join to the count data in step 1, creating a dataset that looks like this.
Rows: 2,408 Columns: 8 $ match_id <int> 9, 12, 13, 14, 29, 33, 60, 61, 78, 81, 88, 90, 92, 109, 111, 116, 124, 129, 131, 131, 132, 150, 156, 186, 190, 191, 194, 207, 213, 218, 230, 252, 256, 260, 270,… $ match_date <chr> "1897-05-22", "1897-05-24", "1897-05-29", "1897-05-29", "1897-06-26", "1897-07-03", "1897-08-28", "1897-09-04", "1898-05-28", "1898-06-04", "1898-06-18", "1898-… $ match_round <chr> "3", "3", "4", "4", "8", "9", "Semi Final", "Semi Final", "4", "5", "7", "7", "8", "12", "13", "14", "16", "17", "Semi Final", "Semi Final", "Grand Final", "5",… $ venue_name <chr> "Victoria Park", "Lake Oval", "Corio Oval", "Lake Oval", "Ikon Park", "Ikon Park", "Brunswick St", "East Melbourne", "Brunswick St", "Corio Oval", "Victoria Par… $ player_team <chr> "Geelong", "Sydney", "Geelong", "Sydney", "Carlton", "Carlton", "Geelong", "Geelong", "Fitzroy", "Fitzroy", "Collingwood", "Geelong", "Geelong", "Fitzroy", "Fit… $ squad <chr> "81;82;83;84;87;88;90;91;92;94;96;97;98;99;100;184;187;188;189;190", "121;123;124;126;127;128;129;130;131;132;133;134;135;137;138;139;140;161;163;164", "81;82;8… $ n_games <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 4, 4, 3, 4, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3… $ n_players <dbl> 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, …
So there are to date 2 408 games that have featured the same lineup for a team at least twice.
The most games with the same lineup is?
games_same_lineup %>% filter(n_games == max(n_games)) %>% select(-squad)
match_id | match_date | match_round | venue_name | player_team | n_games | n_players |
---|---|---|---|---|---|---|
2038 | 1924-05-24 | 5 | MCG | Sydney | 7 | 18 |
2039 | 1924-05-31 | 6 | Lake Oval | Sydney | 7 | 18 |
2043 | 1924-06-07 | 7 | Ikon Park | Sydney | 7 | 18 |
2054 | 1924-06-21 | 9 | Lake Oval | Sydney | 7 | 18 |
2065 | 1924-07-12 | 12 | Lake Oval | Sydney | 7 | 18 |
2079 | 1924-08-23 | 16 | Lake Oval | Sydney | 7 | 18 |
2092 | 1924-09-13 | Semi Final | Windy Hill | Sydney | 7 | 18 |
Sydney (South Melbourne in those days), 7 games back in 1924.
Most games played by a 22-player lineup
Games today feature 22 players per side (more recently 23, with a medical substitute). The most games with the same 22-player lineup is?
games_same_lineup %>% filter(n_players == 22) %>% filter(n_games == max(n_games)) %>% select(-squad)
match_id | match_date | match_round | venue_name | player_team | n_games | n_players |
---|---|---|---|---|---|---|
12800 | 2005-08-06 | 19 | Marvel Stadium | Sydney | 5 | 22 |
12810 | 2005-08-14 | 20 | ANZ Stadium | Sydney | 5 | 22 |
12818 | 2005-08-21 | 21 | SCG | Sydney | 5 | 22 |
12822 | 2005-08-27 | 22 | MCG | Sydney | 5 | 22 |
12829 | 2005-09-02 | Qualifying Final | Subiaco | Sydney | 5 | 22 |
14901 | 2016-06-23 | 14 | Adelaide Oval | Adelaide | 5 | 22 |
14912 | 2016-07-03 | 15 | MCG | Adelaide | 5 | 22 |
14972 | 2016-08-20 | 22 | Adelaide Oval | Adelaide | 5 | 22 |
14988 | 2016-09-10 | Elimination Final | Adelaide Oval | Adelaide | 5 | 22 |
14990 | 2016-09-17 | Semi Final | SCG | Adelaide | 5 | 22 |
15556 | 2019-07-20 | 18 | Gabba | Brisbane Lions | 5 | 22 |
15578 | 2019-08-04 | 20 | Gabba | Brisbane Lions | 5 | 22 |
15582 | 2019-08-10 | 21 | Gabba | Brisbane Lions | 5 | 22 |
15590 | 2019-08-17 | 22 | Gabba | Brisbane Lions | 5 | 22 |
15609 | 2019-09-07 | Qualifying Final | Gabba | Brisbane Lions | 5 | 22 |
5 games, which has happened 3 times in 2005 (Sydney), 2016 (Adelaide) and 2019 (Brisbane).
Games played across seasons by the same lineup
Has a lineup from one season taken to the field again the following season?
games_same_lineup %>% group_by(squad) %>% filter(n_distinct(year(match_date)) > 1) %>% ungroup() %>% select(-squad)
match_id | match_date | match_round | venue_name | player_team | n_games | n_players |
---|---|---|---|---|---|---|
12731 | 2005-05-29 | 10 | Marvel Stadium | Western Bulldogs | 3 | 22 |
12839 | 2006-03-31 | 1 | Marvel Stadium | Western Bulldogs | 3 | 22 |
12847 | 2006-04-08 | 2 | Marvel Stadium | Western Bulldogs | 3 | 22 |
Just once: the round 10 Western Bulldogs from 2005 appeared again in rounds 1 and 2, 2006. Good to see that the result of this code agrees with a tweet from another AFL stats enthusiast.
Have there ever been two or more games where both sides fielded the same lineup?
I’m still working on the logic to answer this one, but I think that:
- If both sides feature a lineup that played more than once (not necessarily against one another), then the match ID should appear twice in the n > 1 dataset
- If those sides and lineups played each other two or more times, then an ordered string made from all 44 player IDs should be counted twice or more
Trying to express that using dplyr:
# function to order and join players from both teams join_teams <- function(first_team, last_team) { teams <- list(first_team, last_team) teams <- lapply(teams, function(x) { x %>% str_split(";") %>% .[[1]] %>% as.numeric() %>% sort() } ) players <- teams %>% unlist() %>% sort() %>% paste(collapse = ";") players } games_same_lineup %>% group_by(match_id) %>% filter(n() > 1) %>% summarise(players = join_teams(first(squad), last(squad))) %>% ungroup() %>% count(players, name = "n_games") %>% count(n_games)
Result:
n_games | n |
---|---|
1 | 99 |
So if I got that right, there are 99 games between teams where each team has featured the same lineup more than once – but never against each other. That is, the same two opposing lineups have never played each other more than once.
Surprising?
In one sense, yes. Seven games with the same team lineup from almost 16 000 seems like a low number. Supporters on my team forum often ask “you mean in a row?” when I mention this number and they seem surprised when I say “no, ever”.
When you think about it more, it’s less surprising. AFL games are pretty physical and most games feature at least one injury that prevents a player playing for at least the next game. Players are dropped, other players return from injury, players are omitted or included because they offer a better match-up depending on the opponent. So really, we shouldn’t expect the same team lineup for week after week.
The fitzRoy package is great not just for AFL analytics, but for answering fun trivia questions like this one. Thanks to the author, James Day. Thanks also to Tony Corke for Twitter discussions on this topic. All code is available as RMarkdown at Github.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.