[This article was first published on The Prince of Slides, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A couple weeks ago, I received an email from a fellow Pitch F/Xer and R-User, Josh Weinstock, asking if I was interested in a guest post here at Prince of Slides. I didn’t think I was important enough to have talented guests posting at my blog; however, Josh pointed out that this site tends to be the place for those who are part of a niche within a niche (i.e. Sabermetrics with R), and that it would be a great place to showcase some of his own work.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Josh currently contributes to It’s About the Money, a Yankee themed ESPN Sweetspot blog. He hails from North Carolina as a die-hard baseball fan. He welcomes discussion of baseball, punk music, and funny tv shows. You can reach him at josh82093 at gmail dot com or on twitter @J__Stock (two underscores). He has posted some pretty cool stuff like this Robinson Cano GIF image made in R. Given his interesting posts and talent in R, I figured he would make a fantastic first ever Guest Post here at the site. Here is what Josh has to say:
Pitching is complicated. In order to be successful, pitchers must have velocity, movement, location, and deception. Thanks to pitch f/x data, the first three are pretty easy to study. In fact, these three variables are more or less directly recorded for every pitch thrown in major league baseball. However, deception remains somewhat of a mystery. This is mainly because we don’t really understand how deception works. However, one subset of deception is pretty easy to quantify: pitch flights. Pitch flights allow us a glimpse into how the batter actually sees the ball. This is just one more piece of data that we can use to help understand the mystery of pitching.
The following tool is intended to help add pitch flight visualizations to your analysis. Of particular importance is the recognizability of breaking balls (size of the “hump”) in relation to the fastball. The time of .075 seconds after the ball is released is also important, as this is the time that Robert Adair (author of the physics of baseball) hypothesized that batters need to decide whether or not to swing. And the graphs are kind of cool.
Before you start, you need to install the XML and animation packages. You also need a basic knowledge of R, though if you read this website I’m sure you’re prepared. If you have trouble with the tool, feel free to ask for help through email ( josh82093 at gmail dot com ) or twitter ( J__Stock ).
So from this, we have some R-code and a really cool (not just kind of cool) function that will plot pitch flights in an animated fashion. While you’ll want to have R experience before using this, the function is extremely user friendly. All you need to know is how to set a working directory and the URL of the pitch data for your desired pitcher.
You can download Josh’s code by clicking here. This is the guts of the function, and you’ll need to use it in multiple steps. The advantage of using this is that it will give a bit more flexibility for experienced R users. As Josh said, you’ll need to install the packages “XML” and “animation” and load them up using the “library()” function before you use Josh’s code. From there, open it up in R as a script and highlight the first two parts (“flightgrab()” and “plot.flight()“). Press “CTRL + R” for those first two functions. Then, you can use these as standard functions as you would anything else in R. With this code, you’ll have to do this each time you open up R. Remember to use:
##load packages
library(XML)
library(animation)
Before you start using the functions and after you have installed them from the CRAN repository.
(I have also broken this full script down into smaller files so that you can use the “source()” function in R on the individual portions of the code. This keeps from having to highlight the code every time you open up R. I’ll go over how to use “source()” in a later post, and I’ll provide these as well. If you’re dying to have this, shoot me an email.)
For the “flightgrab()” function, you’ll need the Brooks Baseball URL for the game and pitcher you want. It should be the page with the table format of the data. You can find these by using the drop down menus at the site. Here is an example of a the page you need to use.
That’s it. Just type this within the parentheses (and be sure to put the full URL in quotes) and R will grab the data directly from the website, transform it, and turn it into a data frame for plotting the pitch flights. Here is some example code below using Mariano Rivera’s pitch data on May 25, 2011 against Toronto (remember to create the function in your R workspace first, so that R will know what “flightgrab()” is):
##grab data
mariano <- flightgrab(“http://www.brooksbaseball.net/pfxVB/tabdel_expanded.php?pitchSel=121250&game=gid_2011_05_25_tormlb_nyamlb_1/&s_type=&h_size=700&v_size=500”)
To check to see if the data was downloaded and transformed correctly by the function, you can just type “mariano” or whatever name you gave it to look at what is under the hood. From here, the “plot.flight()” function uses this data frame format to plot the flight of the ball. We can do this in two ways. If you take a look at the data set, it includes two pitch types: Four-Seamers and Cutters. The data grabbing function automatically puts the information into a flight sequence with 18 data points. So, both pitches have their own flight track. When we use “plot.flight()” to plot these in a color–say, Dark Red here–we don’t know which is which. Below, I have a still version of the plot using a single color:
##make pitches different colors
plot.flight(mariano, color=”darkred”, strikezone=T)
##make pitches different colors
plot.flight(mariano, color=mariano$type, strikezone=T)
##plot only Mariano Rivera’s Cutter
marianoFC <- subset(mariano, mariano$type==”FC”)
plot.flight(marianoFC, color=”darkred”, strikezone=T)
And, you can leave out the strike zone box by not using the “strikezone=” option in the function (i.e. the default is no strike zone). But up to now, this isn’t getting to the point. The point of all this is showing an animated version of the pitch flight. For this, Josh created a nice little “for loop“. For loops are something I’ll get to in some advanced plotting and simulation in the sab-R-metrics series, but essentially what it does is creates a plot for each of the 18 frames of the pitch. When we put these together in a GIF, it comes out as animation (just like a flip-book cartoon). In Josh’s original script, this is the “savMovie()” function. For this, you’ll need to download a program called Image Magick. You can download it at the link. This allows us to write GIF files from R using this function. Go ahead and do that now.
Okay, so now we’re ready to create a Mariano Rivera movie. For this, we’ll use the code from the “saveMovie()” function with a for loop to indicate each frame for each time interval in our data frame. Here is an example following directly from the code above:
####now use the saveMovie stuff
# Create gif
saveMovie( {
for(i in unique(mariano$time)) {
plot.flight(mariano[mariano$time==i,], col=c(1,2), strikezone=T)
text(3.7, 8, “Cutter”, col=1, cex=1.3)
text(3.7, 7.6, “Four Seam”, col=2, cex=1.3)
}
},
movie.name=’mariano.gif’, interval=.5)
For fun, Josh also provided some code for A.J. Burnett. I have the code and the animation below comparing his knuckle-curve with his four-seam fastball (remember, these are Gameday pitch types).
Thanks to Josh for providing this and posting it up here. This is some great work and hopefully others out there can put this function to good use!
To leave a comment for the author, please follow the link and comment on their blog: The Prince of Slides.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.