Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Got an email from someone who was reading the footballistics book they were really into it and got up to chapter 3.
They must be a really big Western Bulldogs fan who has a lot of theories as to why after their premiership year in 2016 it seems they have dropped off suddenly, perhaps it has something to do with their spread of score involvements or who is involved in the scoring chain?
Well ask and you shall receive:
Step one – recreate what is going on in the book
The graph I am interested in recreated to check I understand what is going on here, is the Adelaide Crows graph of % involvement in team scores in 2017.
library(fitzRoy) library(tidyverse) ## -- Attaching packages -------------------------------- tidyverse 1.2.1 -- ## v ggplot2 2.2.1 v purrr 0.2.5 ## v tibble 1.4.2 v dplyr 0.7.5 ## v tidyr 0.8.1 v stringr 1.3.0 ## v readr 1.1.1 v forcats 0.3.0 ## -- Conflicts ----------------------------------- tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() fitzRoy::player_stats%>% filter(Season==2017 & Team=="Adelaide")%>% select(Player, SI, G, B, Round)%>% group_by(Round)%>% mutate(SIT=SI/((sum(G)+sum(B))))%>% group_by(Player)%>% summarise(averageSIT=mean(SIT))%>% arrange(desc(averageSIT)) ## # A tibble: 31 x 2 ## Player averageSIT ## <chr> <dbl> ## 1 Taylor Walker 0.323 ## 2 Tom Lynch 0.300 ## 3 Matt Crouch 0.293 ## 4 Eddie Betts 0.285 ## 5 Rory Sloane 0.272 ## 6 Josh Jenkins 0.254 ## 7 Mitch McGovern 0.252 ## 8 Richard Douglas 0.234 ## 9 Brad Crouch 0.227 ## 10 Sam Jacobs 0.224 ## # ... with 21 more rows
What we can see here is that our leaderboard doesn’t quite align to what the book says? So what could be going on here?
Could be a coding issue? Could be an interpretation issue, or could be something else entirely.
The first thing we could do is do a quick check either manually (checking footywire)
To do this we would run just this part of the script above
fitzRoy::player_stats%>% filter(Season==2017 & Team=="Adelaide")%>% select(Player, SI, G, B, Round)%>% group_by(Round)%>% mutate(TG=sum(G), TB=sum(B))%>% mutate(SIT=SI/((sum(G)+sum(B)))) ## # A tibble: 550 x 8 ## # Groups: Round [25] ## Player SI G B Round TG TB SIT ## <chr> <int> <int> <int> <chr> <int> <int> <dbl> ## 1 Rory Laird 10 0 0 Round 1 22 13 0.286 ## 2 Matt Crouch 8 0 0 Round 1 22 13 0.229 ## 3 Richard Douglas 9 2 0 Round 1 22 13 0.257 ## 4 Rory Sloane 10 0 0 Round 1 22 13 0.286 ## 5 Charlie Cameron 11 2 3 Round 1 22 13 0.314 ## 6 Wayne Milera 6 1 0 Round 1 22 13 0.171 ## 7 David MacKay 15 1 0 Round 1 22 13 0.429 ## 8 Josh Jenkins 12 3 3 Round 1 22 13 0.343 ## 9 Brodie Smith 10 1 2 Round 1 22 13 0.286 ## 10 Rory Atkins 8 3 0 Round 1 22 13 0.229 ## # ... with 540 more rows
We would need to check if in round 1 2017 Rory Laird was involved in 28.57% of Adelaides scores. We can see he had 10 SI, Adelaides players scored in total 22 goals and 13 behinds.
10/(22+13) ## [1] 0.2857143
Looking at the page though, we didn’t include the rushed behinds! Perhaps this is the missing data that will get our numbers to align.
One last check before we check if its rushed behinds. Lets make sure we are looking at the right amount of games.
fitzRoy::player_stats%>% filter(Season==2017 & Team=="Adelaide")%>% select(Player, SI, G, B, Round)%>% group_by(Round)%>% mutate(SIT=SI/((sum(G)+sum(B))))%>% group_by(Player)%>% tally() ## # A tibble: 31 x 2 ## Player n ## <chr> <int> ## 1 Alex Keath 6 ## 2 Andy Otten 19 ## 3 Brad Crouch 20 ## 4 Brodie Smith 23 ## 5 Charlie Cameron 24 ## 6 Curtly Hampton 9 ## 7 Daniel Talia 24 ## 8 David MacKay 22 ## 9 Eddie Betts 24 ## 10 Hugh Greenwood 15 ## # ... with 21 more rows
Here it gives Tex as having played 23 games. Lets move on to see if its the rushed behinds!
df<- fitzRoy::player_stats%>%filter(Season==2017) df2<-fitzRoy::match_results df2<-df2%>%filter(Season==2017) df3<-select(df2, Date, Round, Home.Team, Home.Goals, Home.Behinds) df4<-select(df2, Date, Round, Away.Team, Away.Goals, Away.Behinds) colnames(df3)[3]<-"Team" colnames(df3)[4]<-"Goals" colnames(df3)[5]<-"Behinds" colnames(df4)[3]<-"Team" colnames(df4)[4]<-"Goals" colnames(df4)[5]<-"Behinds" df5<-rbind(df4,df3) df6<-inner_join(df,df5, by=c("Team","Date")) df6%>%filter(Team=="Adelaide")%>% select(Player, SI, Goals, Behinds, Round.x)%>% group_by(Round.x)%>% mutate(SIT=SI/(((Goals)+(Behinds))))%>% group_by(Player)%>% summarise(averageSIT=mean(SIT))%>% arrange(desc(averageSIT)) ## # A tibble: 31 x 2 ## Player averageSIT ## <chr> <dbl> ## 1 Taylor Walker 0.291 ## 2 Tom Lynch 0.268 ## 3 Matt Crouch 0.261 ## 4 Eddie Betts 0.257 ## 5 Rory Sloane 0.243 ## 6 Mitch McGovern 0.234 ## 7 Josh Jenkins 0.227 ## 8 Richard Douglas 0.211 ## 9 Brad Crouch 0.202 ## 10 Sam Jacobs 0.199 ## # ... with 21 more rows
Ok still doesn’t seem as though we know what is up, post to be updated!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.