Charting time series data with dygraphs in R and Python
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This past week, the good people at RStudio advertised over Twitter the release of htmlwidgets for R, a project in collaboration with rCharts wizards Ramnath Vaidyanathan and Kenton Russell. The packages showcased are incredible; I was particularly intrigued by the dygraphs package, which creates interactive time-series charts.
Aside from maps, time series line charts are the most common chart type I use in my teaching, as I often discuss how characteristics of places evolve over time. As such, I took dygraphs for a spin to create charts relevant to a couple topics I address in World Regional Geography.
The first chart shows quarterly unemployment rates in Egypt over the past decade. I obtained the data from Quandl, a massive repository of publicly-available datasets, by connecting to their API. Quandl aims to be as user-friendly as possible; as such, on the page for the dataset you want, you can get the command to access the data in R (or any number of other languages/formats) with a single click. Additionally, you can specify the format for the data from Quandl; dygraphs in R accepts data in xts
format, which Quandl can return for me.
Once I have the data, I can create the plot.
library(dygraphs) library(Quandl) egypt <- Quandl("ILOSTAT/UNE_DEAP_RT_SEX_T_Q_EGY", type = "xts") dygraph(egypt, main = "Quarterly unemployment in Egypt, 2003-2013") %>% dySeries("V1", label = "Unemployment (%)") %>% dyRangeSelector(strokeColor = "darkred", fillColor = "darkred") %>% dyOptions(colors = c("darkred", "darkred"))
The dygraphs package supports piping with the %>%
operator from the magrittr package, which organizes the code nicely. I’ve changed the label in the legend and the colors, though I did notice that I had to supply a length-2 vector of colors to get the line’s color to change, even though I only had one line. I’ve also created a range selector at the bottom of the plot. The chart itself has excellent interactivity; click and drag on the plot to zoom in to a specific section, and double-click to return to the original view. In class, I can use this chart to show the major spike in unemployment after the fall of the Mubarak regime in early 2011. Unemployment, especially among youth, remains a critical issue in Egypt; the World Bank estimates that 38.9 percent of Egyptians aged 15-24 were unemployed in 2013.
The World Bank API is also an excellent resource for demographic and economic data; I’ve written about using it here. Quandl also provides access to World Bank data, but I’ll include an example below that involves connecting directly to the World Bank’s API to show how to get data from a different format into dygraphs.
The WDI package allows R users to connect directly to the World Bank’s API and download data for selected countries, indicators, and years from the World Bank’s Open Data Catalog. The code below fetches export volume index data, which shows how the relative size of countries’ export economies have changed over time. I discuss this in class heavily in the context of China, so I’ll get data to compare China and the United States.
library(WDI) library(tidyr) library(xts) library(dplyr) df <- WDI(country = c("CN", "US"), indicator = "TX.QTY.MRCH.XD.WD", start = 1980, end = 2013, extra = FALSE) df$exports <- df$TX.QTY.MRCH.XD.WD df1 <- df %>% select(country, year, exports) %>% mutate(country = gsub("United States", "USA", df$country)) %>% spread(key = country, value = exports) %>% mutate(date = as.Date(as.character(year), format = "%Y")) %>% select(-year) xtdata <- xts(df1, order.by = df1$date) xtdata$date <- NULL
The WDI package returns data in “tidy” format, in which rows represent country-years. I’ve reshaped my data frame to “wide” format, so that columns represent countries, and then I converted the data frame to an xts
object for plotting with dygraphs. Now, I can create the plot:
dygraph(xtdata, main = "Export volume index, 1980-2013 (2000 = 100)") %>% dyHighlight(highlightSeriesOpts = list(strokeWidth = 3)) %>% dyOptions(colors = c("red", "navy"))
I’ve configured this plot so that a data series is highlighted if hovered over by the user’s cursor. The chart is designed to show one perspective on the massive growth of China’s export economy in recent years; its index is over 700 in 2013, which means that China’s export volume in this year is over seven times what it was in 2000. By contrast, export volume in the United States only increased about 50 percent during the same period.
Dygraphs can also be produced in my other language of choice, Python, using the PyDyGraphs module, available from GitHub. The module allows you to create interactive dygraphs charts in your IPython Notebook session. This module is also very user-friendly; download the pydygraphs.py
module from GitHub and run the code below in your IPython Notebook to produce a plot of aging over time in Italy, Spain, and the Netherlands. This module is not as mature as the R package, however; I was unable to display the plot in nbviewer, although admittedly my knowledge of the Notebook’s architecture is limited.
import pandas as pd import wbdata as wb import pydygraphs ## First, fetch the data from the World Bank API over65ind = {"SP.POP.65UP.TO.ZS": "pctover65"} countries = ["IT", "NL", "ES"] df = wb.get_dataframe(over65ind, country = countries, data_date = (pd.datetime(1960, 1, 1), pd.datetime(2012, 1, 1))) df = df.reset_index() ## Now, reshape to wide format wide = pd.pivot_table(df, values = 'pctover65', index = 'date', columns = 'country').reset_index() wide['date'] = wide['date'].astype(float) ## Finally, create the interactive plot! fig = pydygraphs.figure() xaxis = 'date' fig.plotDataFrame(wide, xaxis) fig.title("Percent of population over age 65") fig.xlabel("Year") fig.ylabel("Percent over 65") fig.show()
Thanks are due, as always, to the RStudio team, Ramnath, and Kenton for their amazing work!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.