[This article was first published on The Prince of Slides, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In my last two posts, I have tinkered with the ‘gam’ package to create heat maps for individual umpire strike zones. I went ahead and grabbed Joe West’s data (which has a lot more pitches than Bruce Froemming in it, since Froemming’s data is only from 2007). Below, I have mapped them out with a new color scheme (those of you reading this, I’m curious as to your opinion on the better scheme). West’s data is aggregated for 2007 through 2010.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Remember in my last post that Bruce Froemming tended to call a larger strike zone for Right Handed Batters (the opposite of J-Doug’s finding at Beyond the Boxscore, and my own regression analysis that found similar results to J-Doug). That led me to map out the new zones for West to see if we find any differences. I’ll just start with the “All Batters” maps below to get a feel for the strike zone. Nothing too striking (though, I will mention here that the ‘span’ is not the same for each map, as the larger amount of observations for West’s map resulted in reducing the span). These aren’t all that useful, though, as the zones differ so much for different batting handedness. In my next post, I’ll break them down into pitch types and counts, and possibly see if West changed over the 4 years. Finally, I’ll try to break down by pitcher handedness as well.
Now, one thing to notice with the new color scheme is that we get a little more information in the outer area of the probabilities of a strike call. That’s good to have, and we can see that there are strikes called a little further outside the zone than the other color scheme had indicated with the naked eye. This scheme is the inverse of an RColorBrewer palette. I’ll show some code later on in the post to get it to work out this way.
In general, it seems as though West does not call strikes very far above or below the strike zone, while he might be extending it a bit further inside and outside. However, we can also see that it’s not symmetrical on each side of the plate. Now let’s take a look at RHB vs. LHB and where this is coming from. Beginning with LHB, it seems as though West’s strike zone is a bit larger than Froemming’s for lefties, and is especially true for the inside portion of the plate.
What about righties? The RHB zone seems to stretch well outside the horizontal strike zone one both sides. So, West seems to be much more likely to call a strike on the inside part of the plate. That’s good for pitchers. Again, though, West is less forgiving with high and low pitches than Froemming seems to be. But much of this is likely an artifact of a few outliers in Froemming’s map, the different spans, and the fact that there are a lot less data points for Froemming than for West–for more info on running a ‘gam’ model that is a bit more robust to outliers, see this site run by previous commenter Matias. I’d also recommend some standardized way to choose the optimal span, which I’ll be working with in the coming weeks to ensure that things are more easily comparable across umpires.
While these plots certainly don’t answer the question of which umpires are discriminating against lefties more, they certainly lead us to believe that it may be a good idea to have fixed effects (or, just dummy variables for each umpire) in a regression model, perhaps with some interaction terms regarding left and right handed batters. There’s plenty of data, so I see little reason to worry about running out of degrees of freedom here. It seems pretty obvious that the strike zones for these two umpires are shaped quite differently, and interestingly, West does not call the outside pitch against right-handers like he does with LHBs. That seems like a disadvantage to the pitcher when facing a RHB. What do you think?
By using a fixed-effects approach, we can see where the lefty-righty bias is coming from in umpire calls, and whether or not it is something across the entire population of umpires, or skewed by a select few discriminating against left-handed batters, whether it be because of stance or unconscious bias. If we have data on umpire handedness (something discussed recently at The Book Blog), this might give us some insight into how they favor their squat behind the plate. Mike Fast has also suggested that batter stance biases the umpires, so it may be interesting to find some data on how close batters (left vs. right) crowd the plate. Any takers?
(sorry they’re not side-by-side…originally the post was like this, but Blogger decided to reformat things and I can’t seem to get it to format correctly).
CODE (using the Pretty R-Tool):
data <- read.csv(file="joe_west_called_pitches.csv", h=T) head(data) attach(data) library(gam) library(RColorBrewer) display.brewer.all() brewer.pal(11, "RdYlBu") buylrd <- c("#313695", "#4575B4", "#74ADD1", "#ABD9E9", "#E0F3F8", "#FFFFBF", "#FEE090", "#FDAE61", "#F46D43", "#D73027", "#A50026") library(gam) ####all batters attach(data) fit.gam <- gam(call_type ~ lo(px, span=.3*aspect.ratio, degree=1) + lo(pz, span=.3, degree=1), family=binomial(link="logit")) myx.gam <- matrix(data=seq(from=-2, to=2, length=30), nrow=30, ncol=30) myz.gam <- t(matrix(data=seq(from=0,to=5, length=30), nrow=30, ncol=30)) fitdata.gam <- data.frame(px=as.vector(myx.gam), pz=as.vector(myz.gam)) mypredict.gam <- predict(fit.gam, fitdata.gam, type="response") mypredict.gam <- matrix(mypredict.gam,nrow=c(30,30)) png(file="WestAllGAMbrewercol.png", width=600, height=675) filled.contour(x=seq(from=-2, to=2, length=30), y=seq(from=0, to=5, length=30), z=mypredict.gam, axes=T, zlim=c(0,1), nlevels=50, color=colorRampPalette(buylrd), main="Joe West Strike Zone Map (GAM Package)", xlab="Horizontal Location (ft.)", ylab="Vertical Location (ft.)", plot.axes={ axis(1, at=c(-2,-1,0,1,2), pos=0, labels=c(-2,-1,0,1,2), las=0, col="black") axis(2, at=c(0,1,2,3,4,5), pos=-2, labels=c(0,1,2,3,4,5), las=0, col="black") rect(-0.708335, mean(data$sz_bot), 0.708335, mean(data$sz_top), border="black", lty="dashed", lwd=2) }, key.axes={ ylim=c(0,1.0) axis(4, at=c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1.0), labels=c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1.0), pos=1, las=0, col="black") }) text(1.4, 2.5, "Probability of Strike Call", cex=1.1, srt=90) dev.off() ##############righties attach(right) fit.gam.r <- gam(call_type ~ lo(px, span=.3*aspect.ratio, degree=1) + lo(pz, span=.3, degree=1), family=binomial(link="logit")) myx.gam.r <- matrix(data=seq(from=-2, to=2, length=30), nrow=30, ncol=30) myz.gam.r <- t(matrix(data=seq(from=0,to=5, length=30), nrow=30, ncol=30)) fitdata.gam.r <- data.frame(px=as.vector(myx.gam.r), pz=as.vector(myz.gam.r)) mypredict.gam.r <- predict(fit.gam.r, fitdata.gam.r, type="response") mypredict.gam.r <- matrix(mypredict.gam.r,nrow=c(30,30)) png(file="WestRightGAMbrewercol.png", width=600, height=675) filled.contour(x=seq(from=-2, to=2, length=30), y=seq(from=0, to=5, length=30), z=mypredict.gam.r, axes=T, zlim=c(0,1), nlevels=50, color=colorRampPalette(buylrd), main="Joe West Strike Zone Map (RHB, GAM Package)", xlab="Horizontal Location (ft.)", ylab="Vertical Location (ft.)", plot.axes={ axis(1, at=c(-2,-1,0,1,2), pos=0, labels=c(-2,-1,0,1,2), las=0, col="black") axis(2, at=c(0,1,2,3,4,5), pos=-2, labels=c(0,1,2,3,4,5), las=0, col="black") rect(-0.708335, mean(data$sz_bot), 0.708335, mean(data$sz_top), border="black", lty="dashed", lwd=2) }, key.axes={ ylim=c(0,1.0) axis(4, at=c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1.0), labels=c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1.0), pos=1, las=0, col="black") }) text(1.4, 2.5, "Probability of Strike Call", cex=1.1, srt=90) dev.off() ###############lefties attach(left) fit.gam.l <- gam(call_type ~ lo(px, span=.3*aspect.ratio, degree=1) + lo(pz, span=.3, degree=1), family=binomial(link="logit")) myx.gam.l <- matrix(data=seq(from=-2, to=2, length=30), nrow=30, ncol=30) myz.gam.l <- t(matrix(data=seq(from=0,to=5, length=30), nrow=30, ncol=30)) fitdata.gam.l <- data.frame(px=as.vector(myx.gam.l), pz=as.vector(myz.gam.l)) mypredict.gam.l <- predict(fit.gam.l, fitdata.gam.l, type="response") mypredict.gam.l <- matrix(mypredict.gam.l,nrow=c(30,30)) png(file="WestLeftGAMbrewercol.png", width=600, height=675) filled.contour(x=seq(from=-2, to=2, length=30), y=seq(from=0, to=5, length=30), z=mypredict.gam.l, axes=T, zlim=c(0,1), nlevels=50, color=colorRampPalette(buylrd), main="Joe West Strike Zone Map (LHB, GAM Package)", xlab="Horizontal Location (ft.)", ylab="Vertical Location (ft.)", plot.axes={ axis(1, at=c(-2,-1,0,1,2), pos=0, labels=c(-2,-1,0,1,2), las=0, col="black") axis(2, at=c(0,1,2,3,4,5), pos=-2, labels=c(0,1,2,3,4,5), las=0, col="black") rect(-0.708335, mean(data$sz_bot), 0.708335, mean(data$sz_top), border="black", lty="dashed", lwd=2) }, key.axes={ ylim=c(0,1.0) axis(4, at=c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1.0), labels=c(0,.1,.2,.3,.4,.5,.6,.7,.8,.9,1.0), pos=1, las=0, col="black") }) text(1.4, 2.5, "Probability of Strike Call", cex=1.1, srt=90) dev.off()
To leave a comment for the author, please follow the link and comment on their blog: The Prince of Slides.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.