Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have a dashboard at work that plots the number of contracts my office handles over time, and I wanted to add a trendline to show growth. However, trendline on the entire dataset skews low because our fiscal year just restarted. It took a little trial and error to figure out how to exclude the most recent year’s count so the trendline is more accurate for this purpose, so I thought I’d share my solution in a short post.
Here’s the code with a dummy dataset, with both the original trendline and the leave-one-out version:
x <- data.frame( x = seq(1:10), y = c(1,1,2,3,5,8,13,21,34,7) ) x %>% arrange(desc(x)) %>% ggplot(aes(x = x, y = y)) + geom_col() + geom_smooth(method = "lm", se = F, color = "red") + geom_smooth(method = "lm", data = function(x) { slice(x, -1) }, se = F, fullrange = TRUE)
The magic happens by defining an anonymous function as the data argument for the geom_smooth function. I’m using the slice function to drop the first element of the data frame provided to geom_smooth.
There are two other important tips here. First, be sure and sort the data in the tidyverse pipe before passing it to ggplot — for my purpose, this was in descending date order because I wanted to drop the most recent entry. The other tip is an aesthetic one, which is to use the fullrange = TRUE argument in order to plot the trendline into the current date column to provide a rough prediction for the current date period.
NOTE: I’ve seen some commentary that the decision to omit elements as described here should be explicitly disclosed. However, I think it’s fairly apparent in this specific application that the extension of the smoother is predictive in nature, rather than a true representation of the last data element. I could probably shade or apply an alpha to the most recent data point using a similar technique to make this even more clear, but the purpose here was to demonstrate the leave-one-out technique.
What do you think of this solution?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.