[This article was first published on ProcRun; » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Of course, a few days before I leave for a much needed vacation, USA Today released their updated NCAA coaching salary database. For sports junkies, there’s an unlimited number of analysis and visualizations that can be done on the data.
I took a quick break from packing to condense the data to a csv and write up a very rough R script. Note: sqldf rocks but installing tcltk (if you have too) can be a bit of a pain. Look here for help with tcltk.
library(ggplot2) library(sqldf) salaries <- read.csv("2011Salary.csv", header=T, sep=",") result <- sqldf('select a.Conference, sum(a.SchoolPay) / b.spc as avg_pay from salaries as a join (select Conference, count(*) as spc from salaries where SchoolPay > 0 group by Conference) as b on a.Conference = b.Conference group by a.Conference') chart <- qplot(result$Conference, result$avg_pay, geom="bar", stat="identity", fill = I("grey50"), main = 'Average Coaches Salary by Conference', xlab = 'Conference', ylab = 'Average Pay') chart + opts(axis.text.x=theme_text(angle=-45))
Most surprising result? PAC-12 coaches average ~ $400,000 less than the Big East.
Full code is available on bitbucket.
Edited per G.'s suggestion: sqldf rocks, tcltk can be tricky.
To leave a comment for the author, please follow the link and comment on their blog: ProcRun; » R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.