Hacking The New Lahman Package 4.0-1 with R-Studio

[This article was first published on R Tricks – Data Science Riot!, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The developers of the Lahman package for R have recently updated the package to include 2014 MLB stats! For those not familiar, this R package recreates Sean Lahman’s Baseball Database into a quick and handy little R package.

I’ve written on the Lahman package before, and even suggested adding a few advanced statistics to the battingStats() function and adding a pitchingStats() function. If you’re still reading, I’m going to assume you’ve already got an instance of Lahman already running on a rational database. So why should you care?

  • Speed: The R package is able to pull up the tables into R dataframes quickly and without the need for a database connection.
  • Easy of Use: Remember, a good programmer is a lazy programmer!
  • Reproducibility: You can easily modify functions as well as add your own and have them ready to go in your R environment.

Install from CRAN

install.packages("Lahman")

Pump up the battingStats function
If you’re brave, you can edit some of the functions once you’ve got the package installed.

battingStats <- edit(battingStats)

The new version offers more batting stats than last year’s, including OBP, SLG and OPS. In the script below I added ISO to the mix!

function (data = Lahman::Batting, idvars = c("playerID", "yearID",
"stint", "teamID", "lgID"), cbind = TRUE)
{
NA2zero <- function(x) {
x[is.na(x)] <- 0
x
}
AB <- R <- H <- X2B <- X3B <- HR <- RBI <- SH <- BB <- HBP <- SF <- TB <- PA <- OBP <- SlugPct <- SO <- ISO <- NA
vars <- c("AB", "R", "H", "X2B", "X3B", "HR", "RBI", "SB",
"CS", "BB", "SO", "IBB", "HBP", "SH", "SF", "GIDP")
d2 <- apply(data[, vars], 2, NA2zero)
d2 <- if (is.vector(d2)) {
as.data.frame(as.list(d2))
}
else {
as.data.frame(d2)
}
d2 <- plyr::mutate(d2,
BA = ifelse(AB > 0, round(H/AB, 3), NA),
PA = AB + BB + HBP + SH + SF, TB = H + X2B + 2 * X3B + 3 * HR,
SlugPct = ifelse(AB > 0, round(TB/AB, 3), NA),
OBP = ifelse(PA > 0, round((H + BB + HBP)/(PA - SH), 3), NA),
OPS = round(OBP + SlugPct, 3),
BABIP = ifelse(AB > 0, round((H - HR)/(AB - SO - HR + SF), 3), NA),
ISO = round((X2B + (2 * X3B) + (3 * HR) / AB), 3)
)
d2 <- d2[, (length(vars) + 1):ncol(d2)]
if (cbind)
data.frame(data, d2)
else data.frame(data[, idvars], d2)
}

Add pitchingStats function
For some reason, this package still doesn’t include advanced pitching stats. Luckily, R users can go right ahead and define their own functions like the only below.

pitchingStats <- function(data=Lahman::Pitching,
idvars=c("playerID","yearID","stint","teamID","lgID"),
cbind=TRUE) {
require('plyr')
NA2zero <- function(x) {
# Takes a column vector and replaces NAs with zeros
x[is.na(x)] <- 0
x
}
W <- L <- G <- GS <- CG <- SHO <- SV <- IPouts <- H <- ER <- HR <- BB <- SO <- BAOpp <- ER <- IBB <-WP <- HBP <- BK <- BFP <-GF <-R <- SH <- SF <- GIDP <-IP <-WHIP <-BABIP <-K_9 <-BB_9 <-Kpct <-BBpct <- NA
# Set needed variables for calculations
vars <- c('IPouts', 'BB', 'H', 'HR', 'BFP', 'SO')
d3 <- apply(data[, vars], 2, NA2zero)
d3 <- if(is.vector(d3)) {as.data.frame(as.list(d3)) } else {
as.data.frame(d3) }
d3 <- plyr::mutate(d3,
IP = IPouts / 3,
WHIP = ifelse(IP > 0, round((BB+H) / IP, 3), NA),
BABIP = ifelse(IP > 0, round((H-HR) / (BFP-SO-BB-HR), 3), NA),
K_9 = ifelse(IP > 0, round((SO*9) / IP, 3), NA),
BB_9 = ifelse(IP > 0, round((BB*9) / IP, 3), NA),
Kpct = ifelse(IP > 0, round(SO/BFP, 3), NA),
BBpct = ifelse(IP > 0, round(BB/BFP, 3), NA)
)
d3 <- d3[, (length(vars)+1):ncol(d3)]
if (cbind) data.frame(data, d3) else data.frame(data[,idvars], d3)
else data.frame(data[, idvars], d3)
}

That’s all for today kids, happy hacking, and job well done by the team on the Lahman package!

Photo by Sue.Ann

To leave a comment for the author, please follow the link and comment on their blog: R Tricks – Data Science Riot!.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)