Site icon R-bloggers

Lubridate/ggplot date helpers

[This article was first published on R – scottishsnow, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post collates a couple of functions to help with dates. I often work with daily data which spans multiple years, but want to visualise annual patterns. To do this I can extract the julian day for each date – i.e. the day of the year. Here are a couple of ways to do this:

# Olden days
format(Sys.Date(), format="%j")

# Tidyverse
library(lubridate)
yday(Sys.Date())

This site is a great resource for more date formats. Otherwise, you can view the lubridate website for guides.

Great, so far. However, most folk like their axis labels spelt out for them and prefer to see month labels on an annual axis instead of a numeric day. Let’s grab some data to demonstrate. Here’s the citation, with download and cleaning below:

Fetterer, F., K. Knowles, W. N. Meier, M. Savoie, and A. K. Windnagel. 2017, updated daily. Sea Ice Index, Version 3. [N seaice extent daily]. Boulder, Colorado USA. NSIDC: National Snow and Ice Data Center. doi: https://doi.org/10.7265/N5K072F8. [2020-04-24].

library(tidyverse)

download.file("ftp://sidads.colorado.edu/DATASETS/NOAA/G02135/north/daily/data/N_seaice_extent_daily_v3.0.csv",
              "Downloads/arctic_ice.csv")

df_names = read_csv("Downloads/arctic_ice.csv",
                    col_names = F,
                    n_max = 1)

df = read_csv("Downloads/arctic_ice.csv",
              skip = 2,
              col_names = as.character(df_names)) %>% 
  janitor::clean_names() %>% 
  mutate(date = paste(year, month, day, sep = "-"),
         date = as.Date(date),
         extent = replace(extent, missing > 0, NA)) %>% 
  select(date, extent)

wrap_width = scales::wrap_format(150)
ice_cite = wrap_width("Fetterer, F., K. Knowles, W. N. Meier, M. Savoie, and A. K. Windnagel. 2017, updated daily. Sea Ice Index, Version 3. [N seaice extent daily]. Boulder, Colorado USA. NSIDC: National Snow and Ice Data Center. doi: https://doi.org/10.7265/N5K072F8. [2020-04-24].")

We’ve now got a two column tibble containing date and sea ice extent. We can see this by plotting our data. (Something similar can be achieved with base graphics using plot(df)):

ggplot(df, aes(date, extent)) +
    geom_line() +
    labs(title = "N. hemisphere sea ice extent",
         x = "Year",
         y = "Extent (10^6 sq km)",
         caption = ice_cite) +
  theme(text = element_text(size = 15))

What do our data look like when we overlay years? i.e. the problem posed at the beginning of this post.

df %>% 
  mutate(year = year(date),
         date = yday(date)) %>% 
  ggplot(aes(date, extent,
             group = year,
             colour = year)) +
  geom_line() +
  scale_colour_viridis_c() +
  labs(title = "Annual fluctuation in N. hemisphere sea ice extent",
       subtitle = "Day of year on x axis",
       x = "Day of year",
       y = "Extent (10^6 sq km)",
       colour = "Year",
       caption = ice_cite) +
  theme(text = element_text(size = 15))

The above looks fine. Perfect for exploratory data analysis. We can quickly see an annual pattern. However, other viewers may wish or expect to see month labels on the x axis. We can do this by setting up a tibble with values we’d like on the axis. With this in place we can call this as breaks and labels for the axis (no doubt there is a fancy function way of doing this). It’s not perfect, the labels appear at the start of each month and given months have differing lengths it’s not easy to place them in the middle (one option is to use the 15th of each month). It could make sense to have variable grid line spacing, where the lines match the month breaks, but this would be awkward to implement and be unexpected to viewers!

doy = date(c("2016-02-01",
             "2016-04-01",
             "2016-06-01",
             "2016-08-01",
             "2016-10-01"))
  
doy = tibble(mon = month(x, label = T),
               jul = yday(x))

df %>% 
  mutate(year = year(date),
         date = yday(date)) %>% 
  ggplot(aes(date, extent,
             group = year,
             colour = year)) +
  geom_line() +
  scale_x_continuous(breaks = doy$jul, labels = doy$mon) +
  scale_colour_viridis_c() +
  labs(title = "Annual fluctuation in N. hemisphere sea ice extent",
       subtitle = "Month on x axis",
       x = "",
       y = "Extent (10^6 sq km)",
       colour = "Year",
       caption = ice_cite) +
  theme(text = element_text(size = 15))

The above is OK, but as mentioned the label position is problematic. We can solve this by hacking at the ggplot theme. We could also label the beginning of every month with this solution, but I haven’t here.

df %>% 
  mutate(year = year(date),
         date = yday(date)) %>% 
  ggplot(aes(date, extent,
             group = year,
             colour = year)) +
  geom_line() +
  scale_x_continuous(breaks = doy$jul, labels = doy$mon) +
  scale_colour_viridis_c() +
  labs(title = "Annual fluctuation in N. hemisphere sea ice extent",
       subtitle = "Month on x axis",
       x = "",
       y = "Extent (10^6 sq km)",
       colour = "Year",
       caption = ice_cite) +
  theme(text = element_text(size = 15),
        axis.ticks.length.x = unit(0.5, "cm"),
        axis.text.x = element_text(vjust = 5.5,
                                   hjust = -0.2))

Finally, we can apply this idea to seasons:

season_lab = tibble(jul = yday(as.Date(c("2019-03-01",
                                     "2019-06-01",
                                     "2019-09-01",
                                     "2019-12-01"))),
                    lab = c("Spring", "Summer", "Autumn", "Winter"))

df %>% 
  mutate(year = year(date),
         date = yday(date)) %>% 
  ggplot(aes(date, extent,
             group = year,
             colour = year)) +
  geom_line() +
  scale_x_continuous(breaks = season_lab$jul, labels = season_lab$lab) +
  scale_colour_viridis_c() +
  labs(title = "Annual fluctuation in N. hemisphere sea ice extent",
       subtitle = "Season on x axis",
       x = "",
       y = "Extent (10^6 sq km)",
       colour = "Year",
       caption = ice_cite) +
  theme(text = element_text(size = 15),
        axis.ticks.length.x = unit(0.5, "cm"),
        axis.text.x = element_text(vjust = 5.5,
                                   hjust = -0.2))

And even do a function to convert dates into seasons for fancy plotting/tables/etc.:

season = function(in_date){
  br = yday(as.Date(c("2019-03-01",
                      "2019-06-01",
                      "2019-09-01",
                      "2019-12-01")))
  x = yday(in_date)
  x = cut(x, breaks = c(0, br, 366))
  levels(x) = c("Winter", "Spring", "Summer", "Autumn", "Winter")
  x
}

df %>% 
  mutate(year = year(date),
         sea = season(date)) %>% 
  group_by(year, sea) %>% 
  summarise(obs = n(),
            q25 = quantile(extent, 0.25),
            q50 = quantile(extent, 0.5),
            q75 = quantile(extent, 0.75)) %>% 
  filter(obs > 40) %>% 
  ggplot(aes(year, q50)) +
  geom_pointrange(aes(ymin = q25, ymax = q75)) +
  facet_wrap(~sea, scales = "free_y") +
  labs(title = "Seasonal change in N. hemisphere sea ice extent",
       subtitle = "Showing median and interquartile range",
       x = "Year",
       y = "Extent (10^6 sq km)",
       caption = ice_cite) +
  theme(text = element_text(size = 15),
        axis.ticks.length.x = unit(0.5, "cm"),
        axis.text.x = element_text(vjust = 5.5,
                                   hjust = -0.2))

To leave a comment for the author, please follow the link and comment on their blog: R – scottishsnow.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.