Example
[This article was first published on pitchR/x, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here is a little example of what I do. While learning R isn’t easy, it can be very powerful and efficient once you get your feet wet. I intend for this example to whet your appetite. This should take you less than 20 minutes. By the end, you will have made this graph:Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Pretty, isn’t it?
Go here: http://joelefkowitz.com/pitcher_card.php?pid=136880 and click “Download excel file.” This is Roy Halladay’s data from 2011. Open the file in excel, click “save as”, and change the file extension to .csv so it looks like “halladay.csv”. This will make it easier to import to R.
Go here, and download R. Then open R. To read in the file, we need to change our working directory. We can do this with the setwd() command. Mine looks like this:
setwd(“C:/Users/Josh/baseball_stuff/PITCHRX”)
Now read in the data. Type:
pitcher = read.csv(“halladay.csv”)
This reads in the file and assigns it to an object called pitcher. To get a feel for the object, first type in
head(pitcher)
This shows you the first few rows of the object so that you know the import wasn’t screwed up. Now type
str(pitcher)
While str() means convert to string in Python, in R it means structure. This will give you a feel for each column in the object, which we can see is of type data.frame (like a spreadsheet in Excel). Looks like everything went well, awesome.
Now I want a graph that shows me Halladay’s pitch locations. And I want it to be pretty, and to be split up by pitch locations. I also want smoothing, and labeled axes. And to top it off, limited dimensions. We need the ggplot2 package. To install it, type
install.packages(“ggplot2”)
Load it by typing
library(“ggplot2”)
Now we can use it. But first, eliminate pitches that we don’t care about, by typing:
pitcher = subset(pitcher, !(pitch_type %in% c(“IN”, “”)))
Now plot away.
ggplot(data = pitcher) +
stat_density2d(geom=”tile”, aes(x = px, y = pz, fill = ..density..), contour = F, data = pitcher) +
facet_wrap(~pitch_type) +
scale_x_continuous(“horizontal pitch location”) +
scale_y_continuous(“vertical pitch location”) +
coord_cartesian(xlim = c(-2, 2), ylim = c(1, 4))
Boom, you just made one high quality graph in less than 20 minutes. Of course I haven’t explained why what we just did works, and it’s pretty complicated, but that’s why you’ll keep reading my website (I hope). We will go over more things like this in the future, but just want to post something quick and powerful.
To leave a comment for the author, please follow the link and comment on their blog: pitchR/x.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.