Site icon R-bloggers

Displaying time series with R

[This article was first published on R on Coding Club UC3M, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Oscar Perpiñán Lamigueiro

Introduction

A time series is a sequence of observations registered at consecutive time instants. The visualization of time series is intended to reveal changes of one or more quantitative variables through time, and to display the relationships between the variables and their evolution through time. The standard time series graph displays the time along the horizontal axis. On the other hand, time can be conceived as a grouping or conditioning variable. This solution allows several variables to be displayed together with a scatterplot, using different panels for subsets of the data (time as a conditioning variable) or using different attributes for groups of the data (time as a grouping variable). Finally, time can be used as a complementary variable that adds information to a graph where several variables are confronted.

Next sections provide a variety of examples to illustrate a set of useful techniques working with three datasets available at the data folder of the repository.

This document has been prepared with content extracted and adapted from the second edition of the book “Displaying time series, spatial and space-time data with R” published with Chapman&Hall/CRC.

The most relevant packages are: lattice, latticeExtra, and ggplot2 for static graphics; zoo and xts for reading and arranging data as time series; RColorBrewer for defining color palettes; and packages dygraphs, highcharter, and plotly, based on the htmlwidgets framework, to generate animations or interactive graphics.

Let’s begin loading the main packages and setting some graphical parameters of lattice and latticeExtra.

library(lattice)
library(ggplot2)
# latticeExtra must be loaded after ggplot2 to prevent masking of `layer`
library(latticeExtra)
library(RColorBrewer)
# lattice and latticeExtra configuration
myTheme <- custom.theme.2(
  pch=19, cex=0.7, region=rev(brewer.pal(9, 'YlOrRd')),
  symbol=brewer.pal(n=8, name="Dark2"))
myTheme$strip.background$col = myTheme$strip.shingle$col =
  myTheme$strip.border$col = 'transparent'

myArgs <- list(
  as.table=TRUE, between=list(x=0.5, y=0.2),
  xscale.components = function(...)
    modifyList(xscale.components.default(...), list(top=FALSE)),
  yscale.components = function(...)
    modifyList(yscale.components.default(...), list(right=FALSE)))

lattice.options(default.theme=myTheme, default.args=modifyList(
  lattice.options()$default.args, myArgs))

library(zoo) 

And this is the data we will use:

load('aranjuez.RData')
load('navarra.RData')
load('CO2.RData')

Time on the horizontal axis

The most frequent visualization method of a time series uses the horizontal axis to depict the time index. This section illustrates two variants of this approach to display multivariate time series: multiple time series with different scales, and variables with the same scale.

Time graph of variables with different scales

There is a variety of scientific research interested in the relationship among several meteorological variables. A suitable approach is to display the time evolution of all of them using a panel for each of the variables. The superposition of variables with different characteristics is not very useful (unless their values were previously rescaled), so this approach is postponed for the next section (variables with the same scale).

For the first example we will use the eight years of daily data from a meteorological station located at Aranjuez (Madrid). This multivariate time series can be displayed with the xyplot method of lattice for zoo objects with a panel for each variable.

## The layout argument arranges panels in rows
xyplot(aranjuez, layout = c(1, ncol(aranjuez))) 

The package ggplot2 provides the generic method autoplot to automate the display of certain classes with a simple command. The package zoo provides an autoplot method for the zoo class with a result similar to that obtained with xyplot.

## facet_free allows each panel to have its own range
autoplot(aranjuez) + facet_free() 

Time series of variables with the same scale

As an example of time series of variables with the same scale, we will use measurements of solar radiation from different meteorological stations.

The first attempt to display this multivariate time series makes use of the xyplot.zoo method. The objective of this graphic is to display the behavior of the collection as a whole: the series are superposed in the same panel (superpose=TRUE) without legend (auto.key=FALSE), using thin lines and partial transparency. Transparefncy softens overplotting problems and reveals density clusters because regions with more overlapping lines are darker. Next code displays the variations around the time average (avRad).

avRad <- zoo(rowMeans(navarra, na.rm = 1), index(navarra))
pNavarra <- xyplot(navarra - avRad,
        superpose = TRUE, auto.key = FALSE,
        lwd = 0.5, alpha = 0.3, col = 'midnightblue') 
pNavarra 

This result can be improved with the horizon graph. The horizon graph is useful in examining how a large number of series changes over time, and does so in a way that allows both comparisons between the individual time series and independent analysis of each series. Moreover, extraordinary behaviors and predominant patterns are easily distinguished.

This graph displays several stacked series collapsing the y-axis to free vertical space:

  • Positive and negative values share the same vertical space. Negative values are inverted and placed above the reference line. Sign is encoded using different hues (positive values in blue and negative values in red).
  • Differences in magnitude are displayed as differences in color intensity (darker colors for greater differences).
  • The color bands share the same baseline and are superposed, with darker bands in front of the lighter ones.

Because the panels share the same design structure, once this technique is understood, it is easy to establish comparisons or spot extraordinary events. This method is what Tufte described as small multiples.

Next code displays the variations of solar radiation around the time average with a horizon graph using a row for each time series. In the code we choose origin=0 and leave the argument horizonscale undefined (default). With this combination each panel has different scales and the colors in each panel represent deviations from the origin. This is depicted in the color key with the \(\Delta_i\) symbol, where the subscript i denotes the existence of multiple panels with different scales.

horizonplot(navarra - avRad,
            layout = c(1, ncol(navarra)),
            origin = 0, ## Deviations in each panel are calculated
                        ## from this value
            colorkey = TRUE,
            col.regions = brewer.pal(6, "RdBu")) 

The horizon graph is also useful in revealing the differences between a univariate time series and another reference. For example, we might be interested in the departure of the observed temperature from the long-term average, or in other words, the temperature change over time. Let’s illustrate this approach with the time series of daily average temperatures measured at the meteorological station of Aranjuez. The reference is the long-term daily average calculated with ave.

Ta <- aranjuez$TempAvg
timeIndex <- index(aranjuez)
longTa <- ave(Ta, format(timeIndex, '%j'))
diffTa <- (Ta - longTa) 

The next code uses horizonplot with the cut-and-stack method to distinguish between years.

years <- unique(format(timeIndex, '%Y'))
  
horizonplot(diffTa, cut = list(n = 8, overlap = 0),
            colorkey = TRUE, layout = c(1, 8),
            scales = list(draw = FALSE, y = list(relation = 'same')),
            origin = 0, strip.left = FALSE) +
    layer(grid.text(years[panel.number()], x  =  0, y  =  0.1, 
                    gp = gpar(cex = 0.8),
                    just = "left")) 

An alternative is a level plot displaying the time series using parts of its time index both as independent and as conditioning variable. The following code displays the differences with the day of the month on the horizontal axis and the year on the vertical axis, with a different panel for each month number. Therefore, each cell of the next figure corresponds to a certain day of the time series.

year <- function(x)as.numeric(format(x, '%Y'))
day <- function(x)as.numeric(format(x, '%d'))
month <- function(x)as.numeric(format(x, '%m')) 
myTheme <- modifyList(custom.theme(region = brewer.pal(9, 'RdBu')),
                      list(
                          strip.background = list(col = 'gray'),
                          panel.background = list(col = 'gray')))

maxZ <- max(abs(diffTa))

levelplot(diffTa ~ day(timeIndex) * year(timeIndex) | factor(month(timeIndex)),
          at = pretty(c(-maxZ, maxZ), n = 8),
          colorkey = list(height = 0.3),
          layout = c(1, 12), strip = FALSE, strip.left = TRUE,
          xlab = 'Day', ylab = 'Month', 
          par.settings = myTheme)

The ggplot version requires a data.frame with the day, year, and month arranged in different columns.

df <- data.frame(Vals = diffTa,
                 Day = day(timeIndex),
                 Year = year(timeIndex),
                 Month = month(timeIndex)) 

The values (Vals column of this data.frame) are displayed as a level plot thanks to the geom_raster function.

library(scales) 
## The packages scales is needed for the pretty_breaks function.

ggplot(data = df,
       aes(fill = Vals,
           x = Day,
           y = Year)) +
    facet_wrap(~ Month, ncol = 1, strip.position = 'left') +
    scale_y_continuous(breaks = pretty_breaks()) + 
    scale_fill_distiller(palette = 'RdBu', direction = 1) + 
    geom_raster() +
    theme(panel.grid.major = element_blank(),
          panel.grid.minor = element_blank()) 

Interactive graphics

This section describes the interactive alternatives of the static figures included in the previous sections with several packages: dygraphs, highcharter, and plotly. These packages are R interfaces to JavaScript libraries based on the htmlwidgets package.

Dygraphs

The dygraphs package is an interface to the dygraphs JavaScript library, and provides facilities for charting time-series. It works automatically with xts time series objects, or with objects than can be coerced to this class. The result is an interactive graph, where values are displayed according to the mouse position over the time series. Regions can be selected to zoom into a time period.

library(dygraphs)

dyTemp <- dygraph(aranjuez[, c("TempMin", "TempAvg", "TempMax")],
       main = "Temperature in Aranjuez",
       ylab = "ºC")

widgetframe::frameWidget(dyTemp)


You can customize dygraphs by piping additional commands onto the original graphic. The function dyOptions provides several choices for the graphic, and the function dyHighlight configures options for data series mouse-over highlighting. For example, with the next code the semi-transparency value of the non-selected lines is reduced and the width of the selected line is increased.

dyTemp %>%
  dyHighlight(highlightSeriesBackgroundAlpha = 0.2,
    highlightSeriesOpts = list(strokeWidth = 2)) %>%
  widgetframe::frameWidget()


An alternative approach to depict the upper and lower variables of this time series is with a shaded region. The dySeries function accepts a character vector of length 3 that specifies a set of input column names to use as the lower, value, and upper for a series with a shaded region around it.

dygraph(aranjuez[, c("TempMin", "TempAvg", "TempMax")],
 main = "Temperature in Aranjuez",
 ylab = "ºC") %>%
   dySeries(c("TempMin", "TempAvg", "TempMax"),
      label = "Temperature") %>%
  widgetframe::frameWidget()


Highcharter

The highcharter package is an interface to the highcharts JavaScript library, with a wide spectrum of graphics solutions. Displaying time series with this package can be achieved with the combination of the generic highchart function and several calls to the hc_add_series_xts function through the pipe %>% operator. Once again, the result is an interactive graph with selection and zoom capabilities.

library(highcharter)
library(xts)

aranjuezXTS <- as.xts(aranjuez)

highchart() %>%
   hc_add_series(name = 'TempMax',
           aranjuezXTS[, "TempMax"]) %>%
   hc_add_series(name = 'TempMin',
           aranjuezXTS[, "TempMin"]) %>%
   hc_add_series(name = 'TempAvg',
           aranjuezXTS[, "TempAvg"]) %>%
  widgetframe::frameWidget()

plotly

The plotly package is an interface to the plotly JavaScript library, also with a wide spectrum of graphics solutions. This package does not provide any function specifically focused on time series. Thus, the time series object has to be transformed into a data.frame including a column for the time index. If the data.frame is in wide format (one column per variable), each variable will be represented with a call to the add_lines function. However, if the data.frame is in long format (a column for values, and a column for variable names) only one call to add_lines is required. The next code follows this approach using the combination of fortify, to convert the zoo object into a data.frame, and melt, to transform from wide to long format.

aranjuezDF <- fortify(aranjuez[,
            c("TempMax",
              "TempAvg",
              "TempMin")],
           melt = TRUE)

summary(aranjuezDF) 
##      Index                Series         Value        
##  Min.   :2004-01-01   TempMax:2898   Min.   :-12.980  
##  1st Qu.:2005-12-29   TempAvg:2898   1st Qu.:  7.107  
##  Median :2008-01-09   TempMin:2898   Median : 13.560  
##  Mean   :2008-01-03                  Mean   : 14.617  
##  3rd Qu.:2010-01-03                  3rd Qu.: 21.670  
##  Max.   :2011-12-31                  Max.   : 41.910  
##                                      NA's   :10

Next code produces an interactive graphic with the generic function plot_ly connected with add_lines through the pipe operator, %>%.

library(plotly)

plot_ly(aranjuezDF) %>%
   add_lines(x = ~ Index,
       y = ~ Value,
       color = ~ Series) %>%
  widgetframe::frameWidget()

Time as a conditioning or grouping variable

Previously we learned to display the time evolution of multiple time series with different scales. But, what if instead of displaying the time evolution we want to represent the relation between the variables? This section follows this approach: first, a scatterplot matrix using groups is defined according to the time as a grouping variable; next, an enhanced scatterplot with time as a conditioning variable is produced using the small multiples technique.

Scatterplot matrix: time as a grouping variable

The scatterplot matrices are based on the technique of small multiples: small, thumbnail-sized representations of multiple images displayed all at once, which allows the reader to immediately, and in parallel, compare the inter-frame differences. A scatterplot matrix is a display of all pairwise bivariate scatterplots arranged in a \(p \times p\) matrix for \(p\) variables. Each subplot shows the relation between the pair of variables at the intersection of the row and column indicated by the variable names in the diagonal panels.

This graphical tool is implemented in the splom function. The following code displays the relation between the set of meteorological variables using a sequential palette from the ColorBrewer catalog (RbBu, with black added to complete a twelve-color palette) to encode the month. The order of colors of this palette is chosen in order to display summer months with intense colors and to distinguish between the first and second half of the year with red and blue, respectively.

aranjuezDF <- as.data.frame(aranjuez)
aranjuezDF$Month <- format(index(aranjuez), '%m') 
## Red-Blue palette with black added (12 colors)
colors <- c(brewer.pal(n = 11, 'RdBu'), '#000000')
## Rearrange according to months (darkest for summer)
colors <- colors[c(6:1, 12:7)] 
splom(~ aranjuezDF[1:10], ## Do not include "Month"
     groups = aranjuezDF$Month,
     auto.key = list(space = 'right', 
         title = 'Month', cex.title = 1),
     pscale = 0, varname.cex = 0.7, xlab = '',
     par.settings = custom.theme(symbol = colors,
             pch = 19),
     cex = 0.3, alpha = 0.1)

The ggplot2 version of this graphic is produced thanks to the ggpairs function provided by the GGally package.

library(GGally)

ggpairs(aranjuezDF,
 columns = 1:10, ## Do not include "Month"
 upper = list(continuous = "points"),
 mapping = aes(colour = Month, alpha = 0.1)) 

Scatterplot with time as a conditioning variable

Previous graphics use colors to encode months. Instead, we will now display separate scatterplots with a panel for each month. In addition, the statistic type (average, maximum, minimum) is included as an additional conditioning variable.

First, the dataset must be reshaped from the wide format (one column for each variable) to the long format (only one column for the temperature values with one row for each observation). This task is easily accomplished with the melt function included in the reshape2 package.

library(reshape2)

aranjuezRshp <- melt(aranjuezDF,
                     measure.vars = c('TempMax',
                                      'TempAvg',
                                      'TempMin'),
                     variable.name = 'Statistic',
                     value.name = 'Temperature')

This matrix of panels can be displayed with ggplot using facet_grid. Next code uses partial transparency to cope with overplotting, small horizontal and vertical segments (geom_rug) to display points density on both variables, and a smooth line in each panel.

ggplot(data = aranjuezRshp, aes(Radiation, Temperature)) +
   facet_grid(Statistic ~ Month) +
   geom_point(col = 'skyblue4', pch = 19, cex = 0.5, alpha = 0.3) +
   geom_rug() +
   stat_smooth(se = FALSE, method = 'loess',
     col = 'indianred1', lwd = 1.2) +
   theme_bw() 

The version with lattice needs the useOuterStrips function from the latticeExtra package, which prints the names of the conditioning variables on the top and left outer margins.

useOuterStrips(
   xyplot(Temperature ~ Radiation | Month * Statistic,
    data = aranjuezRshp,
    between = list(x = 0),
    col = 'skyblue4', pch = 19,
    cex = 0.5, alpha = 0.3)) +
   layer({
 panel.rug(..., col.line = 'indianred1',
       end = 0.05, alpha = 0.6)
 panel.loess(..., col = 'indianred1',
         lwd = 1.5, alpha = 1)
   }) 

Time as a complementary variable

In this section, time will be used as a complementary variable which adds information to a graph where several variables are confronted. We will illustrate this approach with the evolution of the relationship between Gross National Income (GNI) and carbon dioxide (\(CO_2\)) emissions for a set of countries extracted from the database of the World Bank Open Data. We will try several solutions to display the relationship between \(CO_2\) emissions and GNI over the years using time as a complementary variable.

Polylines

Our first approach is to display the entire data in a panel with a scatterplot using country names as the grouping factor. Points of each country are connected with polylines to reveal the time evolution.

## lattice version
xyplot(GNI.capita  ~ CO2.capita, data = CO2data,
xlab = "Carbon dioxide emissions (metric tons per capita)",
ylab = "GNI per capita, PPP (current international $)",
groups = Country.Name, type = 'b') 

Three improvements can be added to this graphical result:

  1. Define a better palette to enhance visual discrimination between countries.
  2. Display time information with labels to show year values.
  3. Label each polyline with the country name instead of a legend.

Choosing colors

The Country.Name categorical variable will be encoded with a qualitative palette, namely the first five colors of Set1 palette from the RColorBrewer package. Because there are more countries than colors, we have to repeat some colors to complete the number of levels of the variable Country.Name. The result is a palette with non-unique colors, and thus some countries will share the same color. This is not a problem because the curves will be labeled, and countries with the same color will be displayed at enough distance.

nCountries <- nlevels(CO2data$Country.Name)
pal <- brewer.pal(n = 5, 'Set1')
pal <- rep(pal, length = nCountries) 

Adjacent colors of this palette are chosen to be easily distinguishable. Therefore, the connection between colors and countries must be in such a way that nearby lines are encoded with adjacent colors of the palette. A simple approach is to calculate the annual average of the variable to be represented along the x-axis (CO2.capita), and extract colors from the palette according to the order of this value.

## Rank of average values of CO2 per capita
CO2mean <- aggregate(CO2.capita ~ Country.Name,
          data = CO2data, FUN = mean)
palOrdered <- pal[rank(CO2mean$CO2.capita)]  

## simpleTheme encapsulates the palette in a new theme for xyplot
myTheme <- simpleTheme(pch = 19, cex = 0.6, col = palOrdered) 
## lattice version
pCO2.capita <- xyplot(GNI.capita  ~ CO2.capita,
           data = CO2data,
           xlab = "Carbon dioxide emissions (metric tons per capita)",
           ylab = "GNI per capita, PPP (current international $)",
           groups = Country.Name,
           par.settings = myTheme,
           type = 'b')

pCO2.capita 

## ggplot2 version
gCO2.capita <- ggplot(data = CO2data,
           aes(x = CO2.capita,
           y = GNI.capita,
           color = Country.Name)) +
   geom_point() + geom_path() +
   scale_color_manual(values = palOrdered, guide = FALSE) +
   xlab('CO2 emissions (metric tons per capita)') +
   ylab('GNI per capita, PPP (current international $)') +
   theme_bw() 

Labels to show time information

This result can be improved with labels displaying the years to show the time evolution. The panel function panel.text prints the year labels with the combination of +.trellis, glayer_ and panel.text. Using glayer_ instead of glayer we ensure that the labels are printed below the lines.

## lattice version
pCO2.capita <- pCO2.capita +
   glayer_(panel.text(...,
        labels = CO2data$Year[subscripts],
          pos = 2, cex = 0.5, col = 'gray'))
pCO2.capita 

## ggplot2 version
gCO2.capita <- gCO2.capita + geom_text(aes(label = Year),
                colour = 'gray',
                size = 2.5,
                hjust = 0, vjust = 0)

Country names: positioning labels

The common solution to link each curve with the group value is to add a legend. However, a legend can be confusing with too many items. In addition, the reader must carry out a complex task: Choose the line, memorize its color, search for it in the legend, and read the country name.

A better approach is to label each line using nearby text with the same color encoding. A suitable method is to place the labels avoiding the overlapping between labels and lines. The package directlabels includes a wide repertory of positioning methods to cope with overlapping. The main function, direct.label, is able to determine a suitable method for each plot, although the user can choose a different method from the collection or even define a custom method. For the pCO2.capita object, the best results are obtained with extreme.grid.

library(directlabels)

## lattice version
direct.label(pCO2.capita,
      method = 'extreme.grid') 

## ggplot2 version
direct.label(gCO2.capita, method = 'extreme.grid') 

Interactive graphics: animation

This section describes how to display the data through animation with interactive functionalities with a solution that resembles the motion chart product published by Gapminder.

Gapminder is an independent foundation based in Stockholm, Sweden. Its mission is “to debunk devastating myths about the world by offering free access to a fact-based world view.” They provide free online tools, data, and videos “to better understand the changing world.” The initial development of Gapminder was the Trendalyzer software, used by Hans Rosling in several sequences of his documentary “The Joy of Stats.”

The information visualization technique used by Trendalyzer is an interactive bubble chart. By default it shows five variables: two numeric variables on the vertical and horizontal axes, bubble size and color, and a time variable that may be manipulated with a slider. The software uses brushing and linking techniques for displaying the numeric value of a highlighted country.

We will mimic the Trendalyzer/Motion Chart solution with the package plotly, using traveling bubbles of different colors and with radius proportional to the values of the variable CO2.PPP. Previously, you should watch the magistral video “200 Countries, 200 Years, 4 Minutes”.

The plotly package has already been used previously to create an interactive graphic representing time in the x-axis. In this section this package produces an animation piping the result of the plot_ly and add_markers functions through the animation_slider function.

Variables CO2.capita and GNI.capita are represented in the x-axis and y-axis, respectively.

p <- plot_ly(CO2data,
      x = ~CO2.capita,
      y = ~GNI.capita,
      sizes = c(10, 100),
      marker = list(opacity = 0.7,
            sizemode = 'diameter')) 

CO2.PPP is encoded with the circle sizes, while Country.Name is represented both with colours and with labels.

p <- add_markers(p,
      size = ~CO2.PPP,
      color = ~Country.Name,
      text = ~Country.Name, hoverinfo = "text",
      ids = ~Country.Name,
      frame = ~Year,
      showlegend = FALSE) 

Finally, the animation is created with animation_opts, to customize the frame and transition times, and with animation_slider to define the slider.

p <- animation_opts(p,
         frame = 1000,
         transition = 800,
         redraw = FALSE)

p <- animation_slider(p,
           currentvalue = list(prefix = "Year "))

widgetframe::frameWidget(p)

To leave a comment for the author, please follow the link and comment on their blog: R on Coding Club UC3M.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.