At Least Tim Thomas Won…….

[This article was first published on Data Twirling » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As you can tell from the content on this blog, I am a really big fan of statistical analysis and the NHL.  I haven’t blogged in some time simply because I have been deeply engrossed by the 2011 playoffs, where I am lucky enough to say that I am a diehard Bruins fan.

With that in mind, I am not that happy that Zdeno did not get the Norris.  While all of the nominees are deserving, clearly I was hoping that he won.  Given that it is easy enough to hack some data together in R, the code below is a very, and I a mean very, superficial look at the statistics of the upper 50% of 2011 NHL defensemen as defined by games played.  Many people don’t like using plus-minus as a stat, coupled with total time on ice, it’s not a bad place to begin.

# set working directory
setwd("/My Dropbox/Projects/NHL Defensemen 2011 performance")
# load the libraries I commonly use
library(XML)
library(plyr)
library(lubridate)
library(ggplot2)
# grab the data
URL <- "http://www.hockey-reference.com/leagues/NHL_2011_skaters.html"
tables <- readHTMLTable(URL)$stats
head(tables)
# filter on D
ds <- tables[tables$Pos == 'D', ]
nrow(ds) # number of records
# change data types -- probably an easier way, but this helped me learn R
str(ds)
for (i in c(1,3,6:19)) {
ds[,i] <- as.numeric(as.character(ds[,i])) # important! -- convert factor to string first
}
for (i in c(2, 4:5, 20)) {
ds[,i] <- as.character(ds[,i])
}
# lets cut on games played to "core" set of players -- upper 50%
summary(ds$GP)
hist(ds$GP, xlab="Games played", main="Distribution of games played")
ds <- ds[ds$GP >= median(ds$GP),]
# lets look at the distribution of +/-
names(ds)[11] <- "plusmin"
summary(ds$plusmin)
hist(ds$plusmin)
# sort the dataframe
sorted.pm <- ds[order(ds$plusmin, decreasing=T), ]
# top 25
head(sorted.pm, n=25)
# plot plusmin and time on ice
plot(ds$TOI, ds$plusmin, xlab="+/-", ylab="Points", pch=20, cex=.8)
# sort the dataframe on TOI
sorted.toi <- ds[order(ds$TOI, decreasing=T), ]
# top 25
head(sorted.toi, n=25)
# Zee was top on +/- and top 3 in TOI..... +/- not the best stat, but coupled with TOI, its a start IMO

To leave a comment for the author, please follow the link and comment on their blog: Data Twirling » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)