Roll Your Own Stats and Geoms in ggplot2 (Part 1: Splines!)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A huge change is coming to ggplot2
and you can get a preview of it over at Hadley’s github repo. I’ve been keenly interested in this as I will be fixing, finishing & porting coord_proj to it once it’s done.
Hadley & Winston have re-built the ggplot2 with an entirely new object-oriented system called ggproto
. With ggproto
it’s now possible to easily extend ggplot2 from within your own packages (since source()
is so last century), often times with very little effort.
Before attempting to port coord_proj
I wanted to work through adding a Geom
and Stat
since thought it would be cool to be able to have interpolated line charts (and it helps answer some recurring StackOverflow “spline”/ggplot2 questions) and also prefer KernSmooth::bkde
over the built-in density
function (which geom_density
and stat_density
both use).
To that end, I’ve made a new github-installable package called ggalt (h/t to @jayjacobs for the better package name than I originally came up with) where I’ll be adding new Geom
s, Stat
s, Coord
s (et al) as I craft them. For now, let me introduce both geom_xspline()
and geom_bkde()
to show how easy it is to incorporate new functionality into ggplot2.
While not a requirement, I think it’s a going to be a good idea to make both a paired Geom
and Stat
when adding those types of functionality to ggplot2. I found it easier to work with custom parameters this way and it also makes it feel a bit more like the way ggplot2 itself works. For the interpolated line geom/stat I used R’s graphics::xpsline
function. Here’s all it took to give ggplot2 lines some curves (you can find the commented version on github):
geom_xspline <- function(mapping = NULL, data = NULL, stat = "xspline", position = "identity", show.legend = NA, inherit.aes = TRUE, na.rm = TRUE, spline_shape=-0.25, open=TRUE, rep_ends=TRUE, ...) { layer( geom = GeomXspline, mapping = mapping, data = data, stat = stat, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(spline_shape=spline_shape, open=open, rep_ends=rep_ends, ...) ) } GeomXspline <- ggproto("GeomXspline", GeomLine, required_aes = c("x", "y"), default_aes = aes(colour = "black", size = 0.5, linetype = 1, alpha = NA) ) stat_xspline <- function(mapping = NULL, data = NULL, geom = "line", position = "identity", show.legend = NA, inherit.aes = TRUE, spline_shape=-0.25, open=TRUE, rep_ends=TRUE, ...) { layer( stat = StatXspline, data = data, mapping = mapping, geom = geom, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(spline_shape=spline_shape, open=open, rep_ends=rep_ends, ... ) ) } StatXspline <- ggproto("StatXspline", Stat, required_aes = c("x", "y"), compute_group = function(self, data, scales, params, spline_shape=-0.25, open=TRUE, rep_ends=TRUE) { tf <- tempfile(fileext=".png") png(tf) plot.new() tmp <- xspline(data$x, data$y, spline_shape, open, rep_ends, draw=FALSE, NA, NA) invisible(dev.off()) unlink(tf) data.frame(x=tmp$x, y=tmp$y) } ) |
If that seems like alot of code, it really isn’t. What we have there are:
- two functions that handle the
Geom
aspects & - two functions that handle the
Stat
aspects.
Let’s look at the Stat
functions first, though you can also just read the handy vignette, too.
Adding Stat
s
In this particular case, we have it easy. We get to use geom_line
/GeomLine
as the base geom_
for the layer since all we’re doing is generating more points for it to draw line segments between. We create the creative interface to our new Stat
with stat_xspline
add three new parameters with default values:
spline_shape
open
rep_ends
“Added three new parameters to what?” you ask? GeomLine
/geom_line
default to StatIdentity
/stat_identity
and if you look at the source code, that Stat
just returns the data back in the form it came in. We’re going to take these three new parameters and pass them to xspline
and then return entirely new values back for ggplot2
/grid
to draw for us, so we tell it to call our new computation engine by giving it the StatXspline
value to the layer. By using GeomLine
/geom_line
as the geom
parameter, all we have to do is ensure we pass back the proper values. We do that in compute_group
since ggplot2
will segment the incoming data into groups (via the group
aesthetic) for us. We take each group and run them through the xspline
with the parameters the user specified. If I didn’t have to use the hack to work around what seems to be errant plot device issues in xspline
, the call would be one line.
Adding Geom
s
We pair up the Stat
with a very basic Geom
“shim” so we can use them interchangeably. It’s the same idiom, an “object” function and the user-callable function. In this case, it’s super-lightweight since we’re really having geom_line
do all the work for us. In a [very] future post, I’ll cover more complex Geom
s that require use of the underlying grid
graphics system, but I suspect most of your own additions may be able to use the lightweight idiom here (and that’s covered in the vignette).
Putting Our New Functions To Work
With our new additions to ggplot2
, we can compare the output of geom_smooth
to geom_xspline
with some test data:
set.seed(1492) dat <- data.frame(x=c(1:10, 1:10, 1:10), y=c(sample(15:30, 10), 2*sample(15:30, 10), 3*sample(15:30, 10)), group=factor(c(rep(1, 10), rep(2, 10), rep(3, 10))) ) ggplot(dat, aes(x, y, group=group, color=factor(group))) + geom_point(color="black") + geom_smooth(se=FALSE, linetype="dashed", size=0.5) + geom_xspline(size=0.5) |
The github page has more examples for the function, but you don’t have to be envious of the smooth D3 curves any more.
I realize this particular addition is not extremely helpful/beneficial, but the next one is. We’ll look at adding a new/more accurate density Stat
/Geom
in the next installment and then discuss the “on-steroids” roxygen2 comments you’ll end up using for your creations in part 3.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.