Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Timelines can be quite a handy way of getting an overview of a player’s career in terms of when they played, with which team and who were their contemporaries
As often is the case, I turned to Stackoverflow to set me on my way for an R solution. In this instance, I did not take the accepted answer but rather the ggplot variation.
I used the RODBC package to extract records of all EPL appearances from my database into a dataframe, ‘allGames’
head(allGames) FIRSTNAME LASTNAME PLAYERID POSITION TEAMID PLAYER_TEAM TEAMNAME DATE START ON 1 Steve Jones JONESS1 F WHU 2054 West Ham U 1993-11-01 0 0 |
The data is pretty self-evident. Position shows that Steve Jones is a forward and that for the game in question he neither started nor was used as a substitute. As I am basically trying to show when players were in the team squad, I will still include these data in the analysis. To obtain a player’s career length at a particular club, I need to find the earliest and latest dates: probably overkill, but I am used to using the plyr package
library(plyr) allGames.summary <- ddply(allGames,.(PLAYERID,TEAMID),function(x) c(start=min(x$DATE),end=max(x$DATE))) # Here is Steve Jone's line at West Ham subset(allGames.summary,TEAMID=="WHU"&PLAYERID=="JONESS1") PLAYERID TEAMID start end 2574 JONESS1 WHU 1993-08-14 1997-02-01 |
OK. Now we can get to some graphing. Let’s go way back to the beginning of the Premier League and look at the squad of the champions that season, Manchester United, id ‘MNU’
library(ggplot2) q <- ggplot(subset(allGames.summary,TEAMID=="MNU"&start==as.POSIXct(min(allGames.summary$start)))) + geom_segment(aes(x=start, xend=end, y=PLAYERID, yend=PLAYERID), size=3) print(q) |
Note the use of the min function again to get the first date and the geom_segment function of ggplot – perfect for producing the required lines. Two gotchas to watch out for. The dates are of POSIXct datatype and unless they are coerced to that an error arises. Also, if the ‘+’ is placed on the second line the layer does not get added and no plot appears
As can be seen, the data looks reasonable. All the lines start at one point and show different end points. To those in the know, Giggs’s line correctly extends to the current day; he is the only player appearing 20 years ago still to pull on a shirt.
However, it is not that aesthetically pleasing. Aspects that could be included include
- Change axes labels and add a title
- Make player’s name more apparent
- Show other EPL teams appeared for, if any
- Give some indication of relative appearances
- Utilize the full width of the graph
- Wrap it in a function
and finally
Some of these amendments need more analysis, others are just adding to the ggplot code
# we need players name from the original dataframe. allGames$player <- paste(allGames$LASTNAME,str_sub(allGames$FIRSTNAME, end=1),sep=" ") #str_sub is in the loaded plyr package # the allGames.summary needs to be reworked allGames.summary <- ddply(allGames,.(PLAYERID,PLAYER_TEAM,TEAMID,player),function(x) c(start=min(x$DATE),end=max(x$DATE),apps=length(x$player))) # create a function which takes the team id and game date as parameters tlPlot <- function(theTeam,theDate) { # to cover all clubs a player appeared for we need to obtain a list of their ids squad <- subset(allGames.summary,TEAMID==theTeam&start==as.POSIXct(theDate))$PLAYERID # order the data by the number of appearances whilst with the team ( and reversed for graph) playerOrder <- arrange(subset(allGames.summary,TEAMID==theTeam&PLAYERID %in% squad),desc(apps))$player playerOrder <- rev(playerOrder) # create the title (full team name and date would be shown with more space) theTitle <- paste("Careers for players appearing for",theTeam,"on",theDate,sep=" ") # Now create the graph object # subset to selected players but for all their teams , indicated by colour q <- ggplot(subset(allGames.summary,PLAYERID %in% squad), aes(colour=TEAMID)) + # show player surname and initial geom_segment(aes(x=start, xend=end, y=player, yend=player), size=3) + # order players in terms of apps for team scale_y_discrete(limits=playerOrder) + # get rid of axis labels and add the title xlab("") + ylab("") +ggtitle(theTitle)+ # extend lines to full width scale_x_datetime(expand = c(0, 0)) return(q) } # make selection. In a production version test for valid teams and # dates would be performed tlPlot("MNU","1992-08-15") |
Not perfect – but certainly more informative and now replicable. The analysis can easily be extended. For instance, one could select the players with top ten appearances for a club or show all those who were on squads whilst a particular player was there. The position factor could be identified by colour whilst using an alpha scale for apps.
But that’s all for now
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.