At Least Tim Thomas Won…….
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As you can tell from the content on this blog, I am a really big fan of statistical analysis and the NHL. I haven’t blogged in some time simply because I have been deeply engrossed by the 2011 playoffs, where I am lucky enough to say that I am a diehard Bruins fan.
With that in mind, I am not that happy that Zdeno did not get the Norris. While all of the nominees are deserving, clearly I was hoping that he won. Given that it is easy enough to hack some data together in R, the code below is a very, and I a mean very, superficial look at the statistics of the upper 50% of 2011 NHL defensemen as defined by games played. Many people don’t like using plus-minus as a stat, coupled with total time on ice, it’s not a bad place to begin.
# set working directory | |
setwd("/My Dropbox/Projects/NHL Defensemen 2011 performance") | |
# load the libraries I commonly use | |
library(XML) | |
library(plyr) | |
library(lubridate) | |
library(ggplot2) | |
# grab the data | |
URL <- "http://www.hockey-reference.com/leagues/NHL_2011_skaters.html" | |
tables <- readHTMLTable(URL)$stats | |
head(tables) | |
# filter on D | |
ds <- tables[tables$Pos == 'D', ] | |
nrow(ds) # number of records | |
# change data types -- probably an easier way, but this helped me learn R | |
str(ds) | |
for (i in c(1,3,6:19)) { | |
ds[,i] <- as.numeric(as.character(ds[,i])) # important! -- convert factor to string first | |
} | |
for (i in c(2, 4:5, 20)) { | |
ds[,i] <- as.character(ds[,i]) | |
} | |
# lets cut on games played to "core" set of players -- upper 50% | |
summary(ds$GP) | |
hist(ds$GP, xlab="Games played", main="Distribution of games played") | |
ds <- ds[ds$GP >= median(ds$GP),] | |
# lets look at the distribution of +/- | |
names(ds)[11] <- "plusmin" | |
summary(ds$plusmin) | |
hist(ds$plusmin) | |
# sort the dataframe | |
sorted.pm <- ds[order(ds$plusmin, decreasing=T), ] | |
# top 25 | |
head(sorted.pm, n=25) | |
# plot plusmin and time on ice | |
plot(ds$TOI, ds$plusmin, xlab="+/-", ylab="Points", pch=20, cex=.8) | |
# sort the dataframe on TOI | |
sorted.toi <- ds[order(ds$TOI, decreasing=T), ] | |
# top 25 | |
head(sorted.toi, n=25) | |
# Zee was top on +/- and top 3 in TOI..... +/- not the best stat, but coupled with TOI, its a start IMO |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.