Site icon R-bloggers

Hacking The New Lahman Package 4.0-1 with R-Studio

[This article was first published on R Tricks – Data Science Riot!, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The developers of the Lahman package for R have recently updated the package to include 2014 MLB stats! For those not familiar, this R package recreates Sean Lahman’s Baseball Database into a quick and handy little R package.

I’ve written on the Lahman package before, and even suggested adding a few advanced statistics to the battingStats() function and adding a pitchingStats() function. If you’re still reading, I’m going to assume you’ve already got an instance of Lahman already running on a rational database. So why should you care?

Install from CRAN

install.packages("Lahman")

Pump up the battingStats function
If you’re brave, you can edit some of the functions once you’ve got the package installed.

battingStats <- edit(battingStats)

The new version offers more batting stats than last year’s, including OBP, SLG and OPS. In the script below I added ISO to the mix!

function (data = Lahman::Batting, idvars = c("playerID", "yearID",
"stint", "teamID", "lgID"), cbind = TRUE)
{
NA2zero <- function(x) {
x[is.na(x)] <- 0
x
}
AB <- R <- H <- X2B <- X3B <- HR <- RBI <- SH <- BB <- HBP <- SF <- TB <- PA <- OBP <- SlugPct <- SO <- ISO <- NA
vars <- c("AB", "R", "H", "X2B", "X3B", "HR", "RBI", "SB",
"CS", "BB", "SO", "IBB", "HBP", "SH", "SF", "GIDP")
d2 <- apply(data[, vars], 2, NA2zero)
d2 <- if (is.vector(d2)) {
as.data.frame(as.list(d2))
}
else {
as.data.frame(d2)
}
d2 <- plyr::mutate(d2,
BA = ifelse(AB > 0, round(H/AB, 3), NA),
PA = AB + BB + HBP + SH + SF, TB = H + X2B + 2 * X3B + 3 * HR,
SlugPct = ifelse(AB > 0, round(TB/AB, 3), NA),
OBP = ifelse(PA > 0, round((H + BB + HBP)/(PA - SH), 3), NA),
OPS = round(OBP + SlugPct, 3),
BABIP = ifelse(AB > 0, round((H - HR)/(AB - SO - HR + SF), 3), NA),
ISO = round((X2B + (2 * X3B) + (3 * HR) / AB), 3)
)
d2 <- d2[, (length(vars) + 1):ncol(d2)]
if (cbind)
data.frame(data, d2)
else data.frame(data[, idvars], d2)
}

Add pitchingStats function
For some reason, this package still doesn’t include advanced pitching stats. Luckily, R users can go right ahead and define their own functions like the only below.

pitchingStats <- function(data=Lahman::Pitching,
idvars=c("playerID","yearID","stint","teamID","lgID"),
cbind=TRUE) {
require('plyr')
NA2zero <- function(x) {
# Takes a column vector and replaces NAs with zeros
x[is.na(x)] <- 0
x
}
W <- L <- G <- GS <- CG <- SHO <- SV <- IPouts <- H <- ER <- HR <- BB <- SO <- BAOpp <- ER <- IBB <-WP <- HBP <- BK <- BFP <-GF <-R <- SH <- SF <- GIDP <-IP <-WHIP <-BABIP <-K_9 <-BB_9 <-Kpct <-BBpct <- NA
# Set needed variables for calculations
vars <- c('IPouts', 'BB', 'H', 'HR', 'BFP', 'SO')
d3 <- apply(data[, vars], 2, NA2zero)
d3 <- if(is.vector(d3)) {as.data.frame(as.list(d3)) } else {
as.data.frame(d3) }
d3 <- plyr::mutate(d3,
IP = IPouts / 3,
WHIP = ifelse(IP > 0, round((BB+H) / IP, 3), NA),
BABIP = ifelse(IP > 0, round((H-HR) / (BFP-SO-BB-HR), 3), NA),
K_9 = ifelse(IP > 0, round((SO*9) / IP, 3), NA),
BB_9 = ifelse(IP > 0, round((BB*9) / IP, 3), NA),
Kpct = ifelse(IP > 0, round(SO/BFP, 3), NA),
BBpct = ifelse(IP > 0, round(BB/BFP, 3), NA)
)
d3 <- d3[, (length(vars)+1):ncol(d3)]
if (cbind) data.frame(data, d3) else data.frame(data[,idvars], d3)
else data.frame(data[, idvars], d3)
}

That’s all for today kids, happy hacking, and job well done by the team on the Lahman package!

Photo by Sue.Ann

To leave a comment for the author, please follow the link and comment on their blog: R Tricks – Data Science Riot!.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.