Data Viz The Glossy
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One of the fun things when doing graphs, is that moment when you identify something that sticks out. A great example of this is done by ethan_meldrum
In it we can see how much Paul Seedsman 2018 has stood out so far. Lets give it the ggplot/fitzRoy treatment.
Step One – Get the Data
library(fitzRoy) library(tidyverse) df<-fitzRoy::player_stats df1<-fitzRoy::get_footywire_stats(9514:9593) df<-df%>%filter(Season != 2018) df2<-rbind(df, df1) df2%>%select(Season,TO, MG, Player) %>% filter(Season==2018) %>% group_by(Player)%>% summarise(Turnovers=sum(TO), Meters_gained=sum(MG))
df<-fitzRoy::player_stats
this gets footywire data from 2010-2018 some round I say some round because this requires a weekly update from James or myself which sadly doesn’t always happen.df1<-fitzRoy::get_footywire_stats(9514:9593)
this gets the footywire data for 2018, 9514 is the first game of 2018 and 9593 was the most recent game as of writing this post (23/05/2018)- We want to stack these dataframes on top of each other but we are away that some of the
df
might contain 2018 data so we just get rid of it usingfilter(Season!=2018)
df2<-rbind(df, df1)
stacks our datasets together and calls itdf2
- From
df2
we then select the columns we wantSeason
,TO
(Turnovers),MG
(Meters gained) andPlayer
- Then we filter out just this years data
filter(Season==2018)
- then we
group_by
player note if we wanted season on season comparisions we would group_by (Season, Player) and not use the filter(Season==2018) We then
summarise
the data, get the Season total for each players Turnovers and meters gained.summarise(Turnovers=sum(TO), Meters_gained=sum(MG))
Step 2
This is an assumption we are making here that we know the players we want to highlight. But lets say we do and the players are Rory Laird, Bryce Gibbs, Paul Seedsman and Tom Mitchell. We need to create a flag for them so we can subset them out for highlighting down the track.
We do this using the mutate
.
mutate(flag = ifelse(Player %in% c("Rory Laird","Bryce Gibbs",Paul Seedsman","Tom Mitchell"), T, F))
Lets break this down
- step a create a variable called flag
flag=
- step b
ifelse(Player %in% c("P1", "P2"))
if Player is in this list (c(“P1”, P2)) - step c , Give value T otherwise give value F
,T,F))
df2%>%select(Season,TO, MG, Player) %>% filter(Season==2018) %>% group_by(Player)%>% summarise(Turnovers=sum(TO), Meters_gained=sum(MG)) %>% mutate(flag = ifelse(Player %in% c("Rory Laird", "Bryce Gibbs", "Paul Seedsman", "Tom Mitchell"), T, F))
Step 3 - Give it the ggplot treatment
ggplot(aes(x=Meters_gained, y=Turnovers)) + geom_point(aes(colour=flag), show.legend = FALSE) + geom_smooth(method='lm',formula=y~x) + geom_text(aes(label=ifelse(Player %in% c ("Rory Laird", "Bryce Gibbs", "Paul Seedsman", "Tom Mitchell"), Player, ""),vjust=-1))
- 1
ggplot(aes(x=Meters_gained, y=Turnovers))
creates a blank canvas with our x variable beingMeters_gained
and y beingTurnovers
- 2
geom_point(aes(colour=flag), show.legend=FALSE)
this creates our graph dots we are taking the xy from ggplot and usingaes(colour=flag)
to give our points different colours depending on the value offlag
- 3
geom_smooth(method=lm, formula=y~x)
this gives us our line of best fit (linear regressionmethod=lm
) - 4
geom_text
this gives us our labels, but we don’t want all the players so we subset the labels. We do this using anotherifelse
5 you can think of the ifelse as if the Player is in this list , give them label of Player, otherwise blank “”
- 5a where
Player
is the list of all Players, checking if any player is in this listc ("Rory Laird","Bryce Gibbs", "Paul Seedsman", "Tom Mitchell")
and if they are give them their original labelPlayer
otherwise a blank label" "
5b
vjust
moves the labels so not directly over the dot
Put it all together
library(fitzRoy) library(tidyverse) ## -- Attaching packages --------------------------------------------------------------- tidyverse 1.2.1 -- ## v ggplot2 2.2.1 v purrr 0.2.4 ## v tibble 1.4.2 v dplyr 0.7.4 ## v tidyr 0.8.0 v stringr 1.3.0 ## v readr 1.1.1 v forcats 0.3.0 ## -- Conflicts ------------------------------------------------------------------ tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() df<-fitzRoy::player_stats df1<-fitzRoy::get_footywire_stats(9514:9593) ## Getting data from footywire.com ## Finished getting data df<-df%>%filter(Season != 2018) df2<-rbind(df, df1) df2%>%select(Season,TO, MG, Player) %>% filter(Season==2018) %>% group_by(Player)%>% summarise(Turnovers=sum(TO), Meters_gained=sum(MG)) %>% mutate(flag = ifelse(Player %in% c("Rory Laird", "Bryce Gibbs", "Paul Seedsman", "Tom Mitchell"), T, F))%>% ggplot(aes(x=Meters_gained, y=Turnovers)) + geom_point(aes(colour=flag), show.legend = FALSE) + geom_smooth(method='lm',formula=y~x) + geom_text(aes(label=ifelse(Player %in% c ("Rory Laird", "Bryce Gibbs", "Paul Seedsman", "Tom Mitchell"), Player, ""),vjust=-1))
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.