Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Since tonight kicks off Game 1 of the Stanley Cup Finals, I thought it would be fun to do a very quick and dirty cluster analysis of the league based on regular season performance.
Tonight, the Chicago Blackhawks square off against my hometown team, the Boston Bruins. Even though it was a lockout-shortened season, the Blackhawks started off by playing 24 consecutive games without a loss. Given this incredible start, I was eager to see how statistically similar the Bruins were relative to their opponent and other teams they faced in the playoffs.
The process is as follows:
- Crawl the 2012-13 regular season data for each team
- Normalize the statistics and create a distance matrix
- Use hierarchical clustering to group the teams
Of course, all of this will be completed in my language of choice, R.
The image above shows 3 dendrograms using 3 different methods.
I will let you draw your own conclusions, but I find it interesting that:
- Chicago and Pittsburgh (the team Boston defeated to go the Stanley Cup) are basically isolated in 2 of the trees
- Using Average linkage, Chicago/Pittsburgh stand alone from the pack, but so does Boston from the group of other playoff teams
- By and large, the techniques were able to isolate the majority of teams that did not make the playoffs
Just in case you are trying to learn R, here is the code.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.