[This article was first published on The Prince of Slides, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A while back, I posted an article using the smoothScatter function in R that builds a color representation of density for scatter plots. When I first found the function, I was extremely excited because it’s a very easy and automated way to make a heat map! Unfortunately, the more I messed with the function, the more annoying it became. But that’s not to say it doesn’t produce very very pretty pictures.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve had a lot of inquiries regarding this function lately, as Harry Pavlidis at THT, Dave Allen at Fangraphs, and Chris Quick at Bay City Ball have implemented it in recent articles elsewhere based on my original code. However, there are some problems with the function: it automatically chooses the range for the data to be plotted.
Now, it absolutely should pick how far out to extrapolate a kernel smoother (it’s generally not a good idea to ever go outside the bounds of the data). However, the ability to control the plotting is a bit wonky. In the case of this function, it chooses the axes in a way that is often off-center or not comparable across different data with different ranges. This is a key attribute needed for plotting Pitch F/X data. I’ve tried using the xlim and ylim options, but this unfortunately makes things worse. If you use these within smoothScatter, it just leaves a bunch of white background beyond where the function chooses to smooth the data. See below for the problems we can run into:
Whitespace:
Off-center:
Chris and others have inquired about this, and I found a few fixes…none of which are great, but I don’t think there are any other options.
Option 1:
Create a color palette in which the color representing the lowest density is white.
For this, when you indicate the colors to be used for smoothing in colramp=colRampPalette=c(“col1”, “col2″….), your first entry should be “white“. Choose carefully, as a white background usually works best with a single color or group of similar colors (i.e. Red Only, Blue Only, Red/Orange). This works okay, but I don’t think it looks quite as nice as having a darker background. A darker background really makes things ‘pop’. Below I have a Bruce Froemming “Called Strike” and “Called Ball” pitch density map by location using this white background using an all-red palette:
Option 2:
Use par(bg=””) just before smoothScatter.
This option works as long as you indicate the color for “bg” (means background) to be the first color in your smoothing palette. This way, you can set your axes the way you want, and everything that is white from before will just be filled in essentially as zero density. Unfortunately, this also colors the background beyond your axes and into your plot title and axis labels. This is certainly not optimal, but if you use the right colors it may not turn out too bad. Notice how dark things look even with a Red palette:
Option 3:
Use rect() to draw filled rectangles in the ranges where the function does not fill when you custom-set your axes.
This is the most flexible option. Unfortunately, it involves some guess-and-check to be sure you don’t overlap your rectangles on top of areas where there is some pitch density. This isn’t as easy as it sounds, and sometimes is impossible (especially if there is some density of pitches near the edges of what smoothScatter chose to plot). For this method, we use the “rect()” function and indicate “col=” within the function to tell it what to fill the rectangle, as well as “border=” to indicate the border to be the same color. See if you can tell where the rectangles begin at the edges of the plot below. In some places you can see evidence of a line that overlapped where I would rather it didn’t:
Finally, an extra suggestion: use the “bandwidth=” option in your smoothScatter plots. I had not bothered with this on my first run with the function, and it uses and automatic bandwidth chosen by the “bk2de” function it calls from. For the data I’ve worked with, 0.20 or 0.25 works relatively well. Of course it depends on your data and what you want out of the plot to determine the optimal smoothing really is.
That’s all I’ve got for now. I wanted to get this up to help people out a little bit, but I have to get back to my work (they expect me to finish this dissertation at some point, I guess). I really think this function makes some of the best looking heat maps out there, I just wish there was a little more customization possible with it. Good luck!
And for good measure, here is my original color scheme that I really love. Just not sure I like the background of everything to be so dark:
Addition: See the comment section for another suggestion by Dave Armstrong. His solution is far easier. I had tried this before, but ran into problems when I forgot to include “add=T” to the parameters within smoothScatter. There are still distinct edges to the image, though, and I’m going to try and see if I can fix things up within the function myself. (Don’t expect too much from me on that part!)
Addition 2: Dave beat me to the punch and fixed up some inner workings of the function. I want to thank him for his help. This is why using R for research and analysis is great: there is a huge support system everywhere! And there is always something new to learn.
To leave a comment for the author, please follow the link and comment on their blog: The Prince of Slides.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.