Make your own PAV
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The idea behind the description recreate to create
is that to make your own rating system be it a player one or a team one, a good first step is to recreate what you see and then add in your own opinion to create your own system.
The guys over at HPN have their own player rating system called PAV which stands for Player Approximate Value.
You can explore their PAV ratings for both Men and Womens.
Why would you want to create your own system?
You might have a different opinion in terms of how the formula is derived
The weightings and multipliers used in each component formula will necessarily look a bit arbitrary, but are the results of adjustment and tweaking until the results lined up with other methods of ranking and evaluating players as described above.
That is not to say how it was done is wrong, but maybe you have another method of ranking and evaluating players that you would like your system to align with.
You might just want to use different variables?
As the collection of several of these measures only commenced in 1998, we have also adapted another formula for the pre-1998 seasons which correlates extremely strongly with the newer formula. Whilst we feel it is less accurate than the newer formula, it still largely conforms to the findings of the newer formula. This formula was created by trying to minimise the standard deviation for each player’s PAV across the last five seasons of AFL football. Around 5% of players have a difference in value of more than one PAV between the new and old formulas.
Lets say you are working in clubland, you might like the ideas used, but have your own internal metrics you are collecting and would like to use instead. Hopefully as a fan of the game you are noticing that more statistics are being made available and accessible through fitzRoy. For example fitzRoy allows users to access both afltables and footywire with footywire containing some extra variables that you might want to include in your rating system such as intercepts and tackles inside 50 to name a couple.
OK so how do we go about recreating?
Well thankfully the guys over at hpn have written about the formula they used.
Step One
The first thing we do is get our datasets. Now we have access through fitzRoy to both data from afltables and footywire and one of the reasons you might be doing this is because you want to use the extra data in one of them for your ratings.
Now its not only just the data that is available through fitzRoy that you can use. At the time of writing this there are a few extra variables you might want to integrate in such as player position and maybe age that haven’t been integrated into fitzRoy but hopefully they will be soonish.
library(tidyverse) ## ── Attaching packages ───────────────────────────────────────────────────────── tidyverse 1.2.1 ── ## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5 ## ✔ tibble 1.4.2 ✔ dplyr 0.7.8 ## ✔ tidyr 0.8.2 ✔ stringr 1.3.1 ## ✔ readr 1.3.1 ✔ forcats 0.3.0 ## ── Conflicts ──────────────────────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() afltables<-fitzRoy::get_afltables_stats(start_date="1990-01-01", end_date="2018-10-10") ## Returning data from 1990-01-01 to 2018-10-10 ## Downloading data ## ## Finished downloading data. Processing XMLs ## Finished getting afltables data footywire<-fitzRoy::player_stats
Something to note about the two datasets is that to join them on together we need some sort of joining ID. The easiest ones are usually done via a key of team name, season, player or soemthing similar. Unfortunetely the teams aren’t named the same through the datasets. For example in the footywire dataset the Greater Western Sydney Giants are called GWS, while in the afltables dataset they are called Greater Western Sydney.
So lets make sure the team names align between datasets so we can join them on later
#####step 1 get team names matching to join on scores to player data afltables<- mutate_if(tibble::as_tibble(afltables), is.character, str_replace_all, pattern = "Greater Western Sydney", replacement = "GWS") afltables <- mutate_if(tibble::as_tibble(afltables), is.character, str_replace_all, pattern = "Brisbane Lions", replacement = "Brisbane") # names(afltables)
Now because we are recreating the blog post we should just focus on some values that we know so we can check to see if we have things covered. So lets filter our data.
afltables<-filter(afltables, Season>2010) afltables<-filter(afltables, Season<2017)
Step Two recreate PAV per blogpost
afltables_home<-filter(afltables, Playing.for==Home.team) afltables_away<-filter(afltables,Playing.for==Away.team) afltables_home$pavO<-afltables_home$Home.score + 0.25*afltables_home$Hit.Outs + 3*afltables_home$Goal.Assists+ afltables_home$Inside.50s+ afltables_home$Marks.Inside.50+ (afltables_home$Frees.For-afltables_home$Frees.Against) afltables_home$pavD<-20*afltables_home$Rebounds + 12*afltables_home$One.Percenters+ (afltables_home$Marks-4*afltables_home$Marks.Inside.50+2*(afltables_home$Frees.For-afltables_home$Frees.Against))- 2/3*afltables_home$Hit.Outs afltables_home$pavM<-15*afltables_home$Inside.50s+ 20*afltables_home$Clearances + 3*afltables_home$Tackles+ 1.5*afltables_home$Hit.Outs + (afltables_home$Frees.For-afltables_home$Frees.Against) afltables_away$pavO<-afltables_away$Away.score + 0.25*afltables_away$Hit.Outs + 3*afltables_away$Goal.Assists+ afltables_away$Inside.50s+ afltables_away$Marks.Inside.50+ (afltables_away$Frees.For-afltables_away$Frees.Against) afltables_away$pavD<-20*afltables_away$Rebounds + 12*afltables_away$One.Percenters+ (afltables_away$Marks-4*afltables_away$Marks.Inside.50+2*(afltables_away$Frees.For-afltables_away$Frees.Against))- 2/3*afltables_away$Hit.Outs afltables_away$pavM<-15*afltables_away$Inside.50s+ 20*afltables_away$Clearances + 3*afltables_away$Tackles+ 1.5*afltables_away$Hit.Outs + (afltables_away$Frees.For-afltables_away$Frees.Against) fulltable<-rbind(afltables_home,afltables_away) names(fulltable) ## [1] "Season" "Round" ## [3] "Date" "Local.start.time" ## [5] "Venue" "Attendance" ## [7] "Home.team" "HQ1G" ## [9] "HQ1B" "HQ2G" ## [11] "HQ2B" "HQ3G" ## [13] "HQ3B" "HQ4G" ## [15] "HQ4B" "Home.score" ## [17] "Away.team" "AQ1G" ## [19] "AQ1B" "AQ2G" ## [21] "AQ2B" "AQ3G" ## [23] "AQ3B" "AQ4G" ## [25] "AQ4B" "Away.score" ## [27] "First.name" "Surname" ## [29] "ID" "Jumper.No." ## [31] "Playing.for" "Kicks" ## [33] "Marks" "Handballs" ## [35] "Goals" "Behinds" ## [37] "Hit.Outs" "Tackles" ## [39] "Rebounds" "Inside.50s" ## [41] "Clearances" "Clangers" ## [43] "Frees.For" "Frees.Against" ## [45] "Brownlow.Votes" "Contested.Possessions" ## [47] "Uncontested.Possessions" "Contested.Marks" ## [49] "Marks.Inside.50" "One.Percenters" ## [51] "Bounces" "Goal.Assists" ## [53] "Time.on.Ground.." "Substitute" ## [55] "Umpire.1" "Umpire.2" ## [57] "Umpire.3" "Umpire.4" ## [59] "group_id" "pavO" ## [61] "pavD" "pavM" fulltable2016<-filter(fulltable, Season==2016)
Step 3 - Check a players values
Now we have the PAV ratings for 2016, lets check a players PAV to see if we have done it right (note you should probably check multiple players but its late)
The player I am going to check is Bryce Gibbs and I am going to check to see if his midfield PAV matches the blog post
### check get same value for bryce gibbs ###matches blog post http://www.hpnfooty.com/?p=21810 fulltable2016%>%group_by(First.name, Surname)%>% summarise(total_mid_pav=sum(pavM))%>% filter(Surname=="Gibbs", First.name=="Bryce") ## # A tibble: 1 x 3 ## # Groups: First.name [1] ## First.name Surname total_mid_pav ## <chr> <chr> <dbl> ## 1 Bryce Gibbs 3984 fulltable2016%>%group_by(Playing.for)%>% summarise(team_mid_pav=sum(pavM)) ## # A tibble: 18 x 2 ## Playing.for team_mid_pav ## <chr> <dbl> ## 1 Adelaide 45679 ## 2 Brisbane 36986. ## 3 Carlton 37702. ## 4 Collingwood 38445 ## 5 Essendon 35483 ## 6 Fremantle 37135 ## 7 Geelong 45027 ## 8 Gold Coast 36114. ## 9 GWS 46991 ## 10 Hawthorn 43548 ## 11 Melbourne 40918. ## 12 North Melbourne 40982. ## 13 Port Adelaide 41328. ## 14 Richmond 35464. ## 15 St Kilda 38218. ## 16 Sydney 49909 ## 17 West Coast 41712. ## 18 Western Bulldogs 47766 100*(3984/37702) ## [1] 10.56708
Hazzaaa it matches. Can someone check the rest tell me where I went wrong and flick me an email please.
As always this is a work in progress so this post will probably get an update.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.