Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In a recent project, I was looking to plot data from different variables along the same time axis. The difficulty was, that some of these variables I wanted to have as point plots, while others I wanted as box-plots.
Because I work with the tidyverse, I wanted to produce these plots with ggplot2. Faceting was the obvious first step but it took me quite a while to figure out how to best combine facets with point plots (where I have one value per time point) with and box-plots (where I have multiple values per time point).
The reason why this isn’t trivial is that box plots require groups or factors on the x-axis, while points can be plotted over a continuous range of x-values. If your alarm bells are ringing right now, you are absolutely right: before you try to combine plots with different x-axis properties, you should think long and hard whether this is an accurate representation of the data and if its a good idea to do so! Here, I had multiple values per time point for one variable and I wanted to make the median + variation explicitly clear, while also showing the continuous changes of other variables over the same range of time.
So, I am writing this short tutorial here in hopes that it saves the next person trying to do something similar from spending an entire morning on stackoverflow. 😉
For this demonstration, I am creating some fake data:
library(tidyverse) dates <- seq(as.POSIXct("2017-10-01 07:00"), as.POSIXct("2017-10-01 10:30"), by = 180) # 180 seconds == 3 minutes fake_data <- data.frame(time = dates, var1_1 = runif(length(dates)), var1_2 = runif(length(dates)), var1_3 = runif(length(dates)), var2 = runif(length(dates))) %>% sample_frac(size = 0.33) head(fake_data) ## time var1_1 var1_2 var1_3 var2 ## 8 2017-10-01 07:21:00 0.2359625 0.6121708 0.4114921 0.03327728 ## 27 2017-10-01 08:18:00 0.5592436 0.3834683 0.8025474 0.44557932 ## 29 2017-10-01 08:24:00 0.7667775 0.4636693 0.7642972 0.97718507 ## 18 2017-10-01 07:51:00 0.2819686 0.3995273 0.9127757 0.42115579 ## 1 2017-10-01 07:00:00 0.5940754 0.1599054 0.7287677 0.91953437 ## 71 2017-10-01 10:30:00 0.2159290 0.2853349 0.7817291 0.57598897
Here, variable 1 (var1
) has three measurements per time point, while variable 2 (var2
) has one.
First, for plotting with ggplot2 we want our data in a tidy long format. I also add another column for faceting that groups the variables from var1
together.
fake_data_long <- fake_data %>% gather(x, y, var1_1:var2) %>% mutate(facet = ifelse(x %in% c("var1_1", "var1_2", "var1_3"), "var1", x)) head(fake_data_long) ## time x y facet ## 1 2017-10-01 07:21:00 var1_1 0.2359625 var1 ## 2 2017-10-01 08:18:00 var1_1 0.5592436 var1 ## 3 2017-10-01 08:24:00 var1_1 0.7667775 var1 ## 4 2017-10-01 07:51:00 var1_1 0.2819686 var1 ## 5 2017-10-01 07:00:00 var1_1 0.5940754 var1 ## 6 2017-10-01 10:30:00 var1_1 0.2159290 var1
Now, we can plot this the following way:
- facet by variable
- subset data to facets for point plots and give aesthetics in
geom_point()
- subset data to facets for box plots and give aesthetics in
geom_boxplot()
. Here we also need to set thegroup
aesthetic; if we don’t specifically give that, we will get a plot with one big box, instead of a box for every time point.
fake_data_long %>% ggplot() + facet_grid(facet ~ ., scales = "free") + geom_point(data = subset(fake_data_long, facet == "var2"), aes(x = time, y = y), size = 1) + geom_line(data = subset(fake_data_long, facet == "var2"), aes(x = time, y = y)) + geom_boxplot(data = subset(fake_data_long, facet == "var1"), aes(x = time, y = y, group = time))
sessionInfo() ## R version 3.4.2 (2017-09-28) ## Platform: x86_64-apple-darwin15.6.0 (64-bit) ## Running under: macOS High Sierra 10.13.1 ## ## Matrix products: default ## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib ## ## locale: ## [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8 ## ## attached base packages: ## [1] methods stats graphics grDevices utils datasets base ## ## other attached packages: ## [1] bindrcpp_0.2 forcats_0.2.0 stringr_1.2.0 dplyr_0.7.4 ## [5] purrr_0.2.4 readr_1.1.1 tidyr_0.7.2 tibble_1.3.4 ## [9] ggplot2_2.2.1 tidyverse_1.2.1 ## ## loaded via a namespace (and not attached): ## [1] tidyselect_0.2.3 reshape2_1.4.2 haven_1.1.0 lattice_0.20-35 ## [5] colorspace_1.3-2 htmltools_0.3.6 yaml_2.1.14 rlang_0.1.4 ## [9] foreign_0.8-69 glue_1.2.0 modelr_0.1.1 readxl_1.0.0 ## [13] bindr_0.1 plyr_1.8.4 munsell_0.4.3 blogdown_0.3 ## [17] gtable_0.2.0 cellranger_1.1.0 rvest_0.3.2 psych_1.7.8 ## [21] evaluate_0.10.1 labeling_0.3 knitr_1.17 parallel_3.4.2 ## [25] broom_0.4.2 Rcpp_0.12.13 scales_0.5.0 backports_1.1.1 ## [29] jsonlite_1.5 mnormt_1.5-5 hms_0.3 digest_0.6.12 ## [33] stringi_1.1.5 bookdown_0.5 grid_3.4.2 rprojroot_1.2 ## [37] cli_1.0.0 tools_3.4.2 magrittr_1.5 lazyeval_0.2.1 ## [41] crayon_1.3.4 pkgconfig_2.0.1 xml2_1.1.1 lubridate_1.7.1 ## [45] assertthat_0.2.0 rmarkdown_1.7 httr_1.3.1 rstudioapi_0.7 ## [49] R6_2.2.2 nlme_3.1-131 compiler_3.4.2
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.