Site icon R-bloggers

Player timelines with ggplot

[This article was first published on PremierSoccerStats » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Timelines can be quite a handy way of getting an overview of a player’s career in terms of when they played, with which team and who were their contemporaries
As often is the case, I turned to Stackoverflow to set me on my way for an R solution. In this instance, I did not take the accepted answer but rather the ggplot variation.
I used the RODBC package to extract records of all EPL appearances from my database into a dataframe, ‘allGames’

?View Code RSPLUS
head(allGames)
FIRSTNAME LASTNAME PLAYERID   POSITION TEAMID PLAYER_TEAM   TEAMNAME       DATE START ON
1     Steve    Jones  JONESS1          F    WHU        2054 West Ham U 1993-11-01     0  0

The data is pretty self-evident. Position shows that Steve Jones is a forward and that for the game in question he neither started nor was used as a substitute. As I am basically trying to show when players were in the team squad, I will still include these data in the analysis. To obtain a player’s career length at a particular club, I need to find the earliest and latest dates: probably overkill, but I am used to using the plyr package

?View Code RSPLUS
library(plyr)
allGames.summary <- ddply(allGames,.(PLAYERID,TEAMID),function(x) c(start=min(x$DATE),end=max(x$DATE)))
# Here is Steve Jone's line at West Ham
subset(allGames.summary,TEAMID=="WHU"&PLAYERID=="JONESS1")
PLAYERID TEAMID      start        end
2574  JONESS1    WHU 1993-08-14 1997-02-01

OK. Now we can get to some graphing. Let’s go way back to the beginning of the Premier League and look at the squad of the champions that season, Manchester United, id ‘MNU’

?View Code RSPLUS
library(ggplot2)
q <- ggplot(subset(allGames.summary,TEAMID=="MNU"&start==as.POSIXct(min(allGames.summary$start)))) +
  geom_segment(aes(x=start, xend=end, y=PLAYERID, yend=PLAYERID), size=3)
print(q)

Note the use of the min function again to get the first date and the geom_segment function of ggplot – perfect for producing the required lines. Two gotchas to watch out for. The dates are of POSIXct datatype and unless they are coerced to that an error arises. Also, if the ‘+’ is placed on the second line the layer does not get added and no plot appears

So what have we got?

As can be seen, the data looks reasonable. All the lines start at one point and show different end points. To those in the know, Giggs’s line correctly extends to the current day; he is the only player appearing 20 years ago still to pull on a shirt.
However, it is not that aesthetically pleasing. Aspects that could be included include

Some of these amendments need more analysis, others are just adding to the ggplot code

?View Code RSPLUS
 
# we  need players name from the original dataframe. 
allGames$player <- paste(allGames$LASTNAME,str_sub(allGames$FIRSTNAME, end=1),sep=" ") #str_sub is in the loaded plyr package
 
# the allGames.summary needs to be reworked
allGames.summary <- ddply(allGames,.(PLAYERID,PLAYER_TEAM,TEAMID,player),function(x) c(start=min(x$DATE),end=max(x$DATE),apps=length(x$player)))
 
# create a function which takes the team id and game date as parameters
tlPlot <- function(theTeam,theDate) {
 
  # to cover all clubs a player appeared for we need to obtain a list of their ids
squad <- subset(allGames.summary,TEAMID==theTeam&start==as.POSIXct(theDate))$PLAYERID
 
# order the data by the number of appearances whilst with the team ( and reversed for graph)
  playerOrder <- arrange(subset(allGames.summary,TEAMID==theTeam&PLAYERID %in% squad),desc(apps))$player
  playerOrder <- rev(playerOrder)
 
# create the title (full team name and date would be shown with more space)
  theTitle <- paste("Careers for players appearing for",theTeam,"on",theDate,sep=" ")
 
# Now create the graph object
  # subset to selected players but for all their teams , indicated by colour
  q <- ggplot(subset(allGames.summary,PLAYERID %in% squad), aes(colour=TEAMID)) +
    # show player surname and initial
    geom_segment(aes(x=start, xend=end, y=player, yend=player), size=3) +
    # order players in terms of apps for team
    scale_y_discrete(limits=playerOrder) +
    # get rid of axis labels and add the title
    xlab("") + ylab("") +ggtitle(theTitle)+
    # extend lines to full width
    scale_x_datetime(expand = c(0, 0))
  return(q)
 
}
 
# make selection. In a production version test for valid teams and 
# dates would be performed
tlPlot("MNU","1992-08-15")

Voila!

Not perfect – but certainly more informative and now replicable. The analysis can easily be extended. For instance, one could select the players with top ten appearances for a club or show all those who were on squads whilst a particular player was there. The position factor could be identified by colour whilst using an alpha scale for apps.
But that’s all for now

To leave a comment for the author, please follow the link and comment on their blog: PremierSoccerStats » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.