Hacking The New Lahman Package 4.0-1 with R-Studio
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The developers of the Lahman package for R have recently updated the package to include 2014 MLB stats! For those not familiar, this R package recreates Sean Lahman’s Baseball Database into a quick and handy little R package.
I’ve written on the Lahman package before, and even suggested adding a few advanced statistics to the battingStats() function and adding a pitchingStats() function. If you’re still reading, I’m going to assume you’ve already got an instance of Lahman already running on a rational database. So why should you care?
- Speed: The R package is able to pull up the tables into R dataframes quickly and without the need for a database connection.
- Easy of Use: Remember, a good programmer is a lazy programmer!
- Reproducibility: You can easily modify functions as well as add your own and have them ready to go in your R environment.
Install from CRAN
install.packages("Lahman")
Pump up the battingStats function
If you’re brave, you can edit some of the functions once you’ve got the package installed.
battingStats <- edit(battingStats)
The new version offers more batting stats than last year’s, including OBP, SLG and OPS. In the script below I added ISO to the mix!
function (data = Lahman::Batting, idvars = c("playerID", "yearID", "stint", "teamID", "lgID"), cbind = TRUE) { NA2zero <- function(x) { x[is.na(x)] <- 0 x } AB <- R <- H <- X2B <- X3B <- HR <- RBI <- SH <- BB <- HBP <- SF <- TB <- PA <- OBP <- SlugPct <- SO <- ISO <- NA vars <- c("AB", "R", "H", "X2B", "X3B", "HR", "RBI", "SB", "CS", "BB", "SO", "IBB", "HBP", "SH", "SF", "GIDP") d2 <- apply(data[, vars], 2, NA2zero) d2 <- if (is.vector(d2)) { as.data.frame(as.list(d2)) } else { as.data.frame(d2) } d2 <- plyr::mutate(d2, BA = ifelse(AB > 0, round(H/AB, 3), NA), PA = AB + BB + HBP + SH + SF, TB = H + X2B + 2 * X3B + 3 * HR, SlugPct = ifelse(AB > 0, round(TB/AB, 3), NA), OBP = ifelse(PA > 0, round((H + BB + HBP)/(PA - SH), 3), NA), OPS = round(OBP + SlugPct, 3), BABIP = ifelse(AB > 0, round((H - HR)/(AB - SO - HR + SF), 3), NA), ISO = round((X2B + (2 * X3B) + (3 * HR) / AB), 3) ) d2 <- d2[, (length(vars) + 1):ncol(d2)] if (cbind) data.frame(data, d2) else data.frame(data[, idvars], d2) }
Add pitchingStats function
For some reason, this package still doesn’t include advanced pitching stats. Luckily, R users can go right ahead and define their own functions like the only below.
pitchingStats <- function(data=Lahman::Pitching, idvars=c("playerID","yearID","stint","teamID","lgID"), cbind=TRUE) { require('plyr') NA2zero <- function(x) { # Takes a column vector and replaces NAs with zeros x[is.na(x)] <- 0 x } W <- L <- G <- GS <- CG <- SHO <- SV <- IPouts <- H <- ER <- HR <- BB <- SO <- BAOpp <- ER <- IBB <-WP <- HBP <- BK <- BFP <-GF <-R <- SH <- SF <- GIDP <-IP <-WHIP <-BABIP <-K_9 <-BB_9 <-Kpct <-BBpct <- NA # Set needed variables for calculations vars <- c('IPouts', 'BB', 'H', 'HR', 'BFP', 'SO') d3 <- apply(data[, vars], 2, NA2zero) d3 <- if(is.vector(d3)) {as.data.frame(as.list(d3)) } else { as.data.frame(d3) } d3 <- plyr::mutate(d3, IP = IPouts / 3, WHIP = ifelse(IP > 0, round((BB+H) / IP, 3), NA), BABIP = ifelse(IP > 0, round((H-HR) / (BFP-SO-BB-HR), 3), NA), K_9 = ifelse(IP > 0, round((SO*9) / IP, 3), NA), BB_9 = ifelse(IP > 0, round((BB*9) / IP, 3), NA), Kpct = ifelse(IP > 0, round(SO/BFP, 3), NA), BBpct = ifelse(IP > 0, round(BB/BFP, 3), NA) ) d3 <- d3[, (length(vars)+1):ncol(d3)] if (cbind) data.frame(data, d3) else data.frame(data[,idvars], d3) else data.frame(data[, idvars], d3) }
That’s all for today kids, happy hacking, and job well done by the team on the Lahman package!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.