Visualize monthly precipitation anomalies

[This article was first published on R on Dominic Royé, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Normally when we visualize monthly precipitation anomalies, we simply use a bar graph indicating negative and positive values with red and blue. However, it does not explain the general context of these anomalies. For example, what was the highest or lowest anomaly in each month? In principle, we could use a boxplot to visualize the distribution of the anomalies, but in this particular case they would not fit aesthetically, so we should look for an alternative. Here I present a very useful graphic form.

Packages

In this post we will use the following packages:

Package Description
tidyverse Collection of packages (visualization, manipulation): ggplot2, dplyr, purrr, etc.
readr Import data
ggthemes Themes for ggplot2
lubridate Easy manipulation of dates and times
cowplot Easy creation of multiple graphics with ggplot2
#we install the packages if necessary
if(!require("tidyverse")) install.packages("tidyverse")
if(!require("ggthemes")) install.packages("broom")
if(!require("cowplot")) install.packages("cowplot")
if(!require("lubridate")) install.packages("lubridate")

#packages
library(tidyverse) #include readr
library(ggthemes)
library(cowplot)
library(lubridate)

Preparing the data

First we import the daily precipitation of the selected weather station (download). We will use data from Santiago de Compostela (Spain) accessible through ECA&D.

Step 1: import the data

We not only import the data in csv format, but we also make the first changes. We skip the first 21 rows that contain information about the weather station. In addition, we convert the date to the date class and replace missing values (-9999) with NA. The precipitation is given in 0.1 mm, therefore, we must divide the values by 10. Then we select the columns DATE and RR, and rename them.

data <- read_csv("RR_STAID001394.txt", skip = 21) %>%
             mutate(DATE = ymd(DATE), RR = ifelse(RR == -9999, NA, RR/10)) %>%
               select(DATE:RR) %>% 
             rename(date = DATE, pr = RR)
## 
## -- Column specification --------------------------------------------------------
## cols(
##   STAID = col_double(),
##   SOUID = col_double(),
##   DATE = col_double(),
##   RR = col_double(),
##   Q_RR = col_double()
## )
data
## # A tibble: 27,606 x 2
##    date          pr
##    <date>     <dbl>
##  1 1943-11-01   0.6
##  2 1943-11-02   0  
##  3 1943-11-03   0  
##  4 1943-11-04   0  
##  5 1943-11-05   0  
##  6 1943-11-06   0  
##  7 1943-11-07   0  
##  8 1943-11-08   0  
##  9 1943-11-09   0  
## 10 1943-11-10   0  
## # ... with 27,596 more rows

Step 2: creating monthly values

In the second step we calculate the monthly amounts of precipitation. To do this, a) we limit the period to the years after 1950, b) we add the month with its labels and the year as variables.

data <- mutate(data, mo = month(date, label = TRUE), yr = year(date)) %>%
            filter(date >= "1950-01-01") %>%
                group_by(yr, mo) %>% 
                   summarise(prs = sum(pr, na.rm = TRUE))
## `summarise()` has grouped output by 'yr'. You can override using the `.groups` argument.
data
## # A tibble: 833 x 3
## # Groups:   yr [70]
##       yr mo      prs
##    <dbl> <ord> <dbl>
##  1  1950 Jan    55.6
##  2  1950 Feb   349. 
##  3  1950 Mar    85.8
##  4  1950 Apr    33.4
##  5  1950 May   272. 
##  6  1950 Jun   111. 
##  7  1950 Jul    35.4
##  8  1950 Aug    76.4
##  9  1950 Sep    85  
## 10  1950 Oct    53  
## # ... with 823 more rows

Step 3: estimating anomalies

Now we must estimate the normals of each month and join this table to our main data in order to calculate the monthly anomaly. We express the anomalies in percentage and subtract 100 to set the average to 0. In addition, we create a variable which indicates if the anomaly is negative or positive, and another with the date.

pr_ref <- filter(data, yr > 1981, yr <= 2010) %>%
                   group_by(mo) %>%
                      summarise(pr_ref = mean(prs))

data <- left_join(data, pr_ref, by = "mo")

data <- mutate(data, 
               anom = (prs*100/pr_ref)-100, 
               date = str_c(yr, as.numeric(mo), 1, sep = "-") %>% ymd(),
               sign= ifelse(anom > 0, "pos", "neg") %>% factor(c("pos", "neg")))

We can do a first test graph of anomalies (the classic one), for that we filter the year 2018. In this case we use a bar graph, remember that by default the function geom_bar() applies the counting of the variable. However, in this case we know y, hence we indicate with the argument stat = "identity" that it should use the given value in aes().

filter(data, yr == 2018) %>%
   ggplot(aes(date, anom, fill = sign)) + 
       geom_bar(stat = "identity", show.legend = FALSE) + 
    scale_x_date(date_breaks = "month", date_labels = "%b") +
    scale_y_continuous(breaks = seq(-100, 100, 20)) +
    scale_fill_manual(values = c("#99000d", "#034e7b")) +
         labs(y = "Precipitation anomaly (%)", x = "") +
          theme_hc()

Step 4: calculating the statistical metrics

In this last step we estimate the maximum, minimum value, the 25%/75% quantiles and the interquartile range per month of the entire time series.

data_norm <-     group_by(data, mo) %>%
                     summarise(mx = max(anom),
                               min = min(anom),
                               q25 = quantile(anom, .25),
                               q75 = quantile(anom, .75),
                               iqr = q75-q25)
data_norm
## # A tibble: 12 x 6
##    mo       mx    min   q25   q75   iqr
##    <ord> <dbl>  <dbl> <dbl> <dbl> <dbl>
##  1 Jan    193.  -89.6 -43.6 56.3   99.9
##  2 Feb    320.  -96.5 -51.2 77.7  129. 
##  3 Mar    381. -100   -40.6 88.2  129. 
##  4 Apr    198.  -93.6 -51.2 17.1   68.3
##  5 May    141.  -90.1 -45.2 17.0   62.2
##  6 Jun    419.  -99.3 -58.2 50.0  108. 
##  7 Jul    311.  -98.2 -77.3 27.1  104. 
##  8 Aug    264. -100   -68.2 39.8  108. 
##  9 Sep    241.  -99.2 -64.9 48.6  113. 
## 10 Oct    220.  -99.0 -54.5  4.69  59.2
## 11 Nov    137.  -98.8 -44.0 39.7   83.7
## 12 Dec    245.  -91.8 -49.8 36.0   85.8

Creating the graph

To create the anomaly graph with legend it is necessary to separate the main graph from the legends.

Part 1

In this first part we are adding layer by layer the different elements: 1) the range of anomalies maximum-minimum 2) the interquartile range and 3) the anomalies of the year 2018.

#range of anomalies maximum-minimum
g1.1 <- ggplot(data_norm)+
           geom_crossbar(aes(x = mo, y = 0, ymin = min, ymax = mx),
                        fatten = 0, fill = "grey90", colour = "NA")

g1.1

#adding interquartile range
g1.2 <- g1.1 + geom_crossbar(aes(x = mo, y = 0, ymin = q25, ymax = q75),
                              fatten = 0, fill = "grey70")

g1.2

#adding anomalies of the year 2018 

g1.3 <- g1.2 + geom_crossbar(data = filter(data, yr == 2018),
                aes(x = mo, y = 0, ymin = 0, ymax = anom, fill = sign),
                fatten = 0, width = 0.7, alpha = .7, colour = "NA",
                show.legend = FALSE)
g1.3

Finally we change some last style settings.

g1 <- g1.3 + geom_hline(yintercept = 0)+
               scale_fill_manual(values=c("#99000d","#034e7b"))+
               scale_y_continuous("Precipitation anomaly (%)",
                                   breaks = seq(-100, 500, 25),
                                   expand = c(0, 5))+
            labs(x = "",
                 title = "Precipitation anomaly in Santiago de Compostela 2018",
                 caption="Dominic Royé (@dr_xeo) | Data: eca.knmi.nl")+
            theme_hc()
g1

Part 2

We still need a legend. First we create it for the normals.

#legend data
legend <- filter(data_norm, mo == "Jan")

legend_lab <- gather(legend, stat, y, mx:q75) %>%
                 mutate(stat = factor(stat, stat, c("maximum",
                                                   "minimum",
                                                   "Quantile 25%",
                                                   "Quantile 75%")) %>%
                                            as.character())
## Warning: attributes are not identical across measure variables;
## they will be dropped
#legend graph
g2 <- legend %>% ggplot()+
                  geom_crossbar(aes(x = mo, y = 0, ymin = min, ymax = mx),
                                fatten = 0, fill = "grey90", colour = "NA", width = 0.2) +
                  geom_crossbar(aes(x = mo, y = 0, ymin = q25, ymax = q75),
                                fatten = 0, fill = "grey70", width = 0.2) +
                  geom_text(data = legend_lab, 
                            aes(x = mo, y = y+c(12,-8,-10,12), label = stat), 
                            fontface = "bold", size = 2) +
                   annotate("text", x = 1.18, y = 40, 
                            label = "Period 1950-2018", angle = 90, size = 3) +
              theme_void() + 
                theme(plot.margin = unit(c(0, 0, 0, 0), "cm"))

g2

Second, we create another legend for the current anomalies.

#legend data
legend2 <- filter(data, yr == 1950, mo %in% c("Jan","Feb")) %>% 
              ungroup() %>% 
            select(mo, anom, sign)

legend2[2,1] <- "Jan"

legend_lab2 <- data.frame(mo = rep("Jan", 3), 
                          anom= c(110, 3, -70), 
                          label = c("Positive anomaly", "Average", "Negative anomaly"))

#legend graph
g3 <-  ggplot() + 
         geom_bar(data = legend2,
                aes(x = mo, y = anom, fill = sign),
                   alpha = .6, colour = "NA", stat = "identity", show.legend = FALSE, width = 0.2) +
         geom_segment(aes(x = .85, y = 0, xend = 1.15, yend = 0), linetype = "dashed") +
         geom_text(data = legend_lab2, 
                   aes(x = mo, y = anom+c(10,5,-13), label = label), 
                   fontface = "bold", size = 2) +
         annotate("text", x = 1.25, y = 20, 
                  label ="Reference 1971-2010", angle = 90, size = 3) +
         scale_fill_manual(values = c("#99000d", "#034e7b")) +
        theme_void() +
         theme(plot.margin = unit(c(0, 0, 0, 0), "cm"))

g3

Part 3

Finally, we only have to join the graph and the legends with the help of the cowplot package. The main function of cowplot is plot_grid() which is used for combining different graphs. However, in this case it is necessary to use more flexible functions to create less common formats. The ggdraw() function configures the basic layer of the graph, and the functions that are intended to operate on this layer start with draw_*.

p <- ggdraw() +
       draw_plot(g1, x = 0, y = .3, width = 1, height = 0.6) +
       draw_plot(g2, x = 0, y = .15, width = .2, height = .15) +
       draw_plot(g3, x = 0.08, y = .15, width = .2, height = .15)

p

save_plot("pr_anomaly2016_scq.png", p, dpi = 300, base_width = 12.43, base_height = 8.42)

Multiple facets

In this section we will make the same graph as in the previous one, but for several years.

Part 1

First we need to filter again by set of years, in this case from 2016 to 2018, using the operator %in%, we also add the function facet_grid() to ggplot, which allows us to plot the graph according to a variable. The formula used for the facet function is similar to the use in models: variable_by_row ~ variable_by_column. When we do not have a variable in the column, we should use the ..

#range of anomalies maximum-minimum
g1.1 <- ggplot(data_norm)+
           geom_crossbar(aes(x = mo, y = 0, ymin = min, ymax = mx),
                        fatten = 0, fill = "grey90", colour = "NA")

g1.1

#adding the interquartile range
g1.2 <- g1.1 + geom_crossbar(aes(x = mo, y = 0, ymin = q25, ymax = q75),
                              fatten = 0, fill = "grey70")

g1.2

#adding the anomalies of the year 2016-2018

g1.3 <- g1.2 + geom_crossbar(data = filter(data, yr %in% 2016:2018),
                aes(x = mo, y = 0, ymin = 0, ymax = anom, fill = sign),
                fatten = 0, width = 0.7, alpha = .7, colour = "NA",
                show.legend = FALSE) +
               facet_grid(yr ~ .)
g1.3

Finally we change some last style settings.

g1 <- g1.3 + geom_hline(yintercept = 0)+
               scale_fill_manual(values=c("#99000d","#034e7b"))+
               scale_y_continuous("Anomalía de precipitación (%)",
                                   breaks = seq(-100, 500, 50),
                                   expand = c(0, 5))+
            labs(x = "",
                 title = "Anomalía de precipitación en Santiago de Compostela",
                 caption="Dominic Royé (@dr_xeo) | Datos: eca.knmi.nl")+
            theme_hc()
g1

We use the same legend created for the previous graph.

Part 2

Finally, we join the graph and the legends with the help of the cowplot package. The only thing we must adjust here are the arguments in the draw_plot() function to correctly place the different parts.

p <- ggdraw() +
       draw_plot(g1, x = 0, y = .18, width = 1, height = 0.8) +
       draw_plot(g2, x = 0, y = .08, width = .2, height = .15) +
       draw_plot(g3, x = 0.08, y = .08, width = .2, height = .15)

p

save_plot("pr_anomaly20162018_scq.png", p, dpi = 300, base_width = 12.43, base_height = 8.42)

To leave a comment for the author, please follow the link and comment on their blog: R on Dominic Royé.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)