Interactive flow visualization in R

[This article was first published on Kyle Walker, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Exploring flows between origins and destinations visually is a common task, but can be difficult to get right. In R, there are many tutorials on the web that show how to produce static flow maps (see here, here, here, and here, among others).

Over the past couple years, R developers have created an infrastructure to bridge R with JavaScript using the htmlwidgets package, allowing for the generation of interactive web visualizations straight from R. I’d like to demonstrate here a few examples for exploratory interactive flow graphics that use this infrastructure.

To start, let’s make a random dataset that links countries with US states.

library(dplyr)

set.seed(1983)

df <- data_frame(origins = sample(c('Portugal', 'Romania', 'Nigeria', 'Peru'), 
                                  size = 100, replace = TRUE), 
                 destinations = sample(c('Texas', 'New Jersey', 'Colorado', 'Minnesota'), 
                                       size = 100, replace = TRUE))

head(df)
## # A tibble: 6 × 2
##    origins destinations
##      <chr>        <chr>
## 1  Romania    Minnesota
## 2 Portugal        Texas
## 3 Portugal    Minnesota
## 4  Nigeria    Minnesota
## 5     Peru     Colorado
## 6 Portugal     Colorado

We can use dplyr to get counts of the unique origin-destination pairs as follows:

df2 <- df %>%
  group_by(origins, destinations) %>%
  summarize(counts = n()) %>%
  ungroup() %>%
  arrange(desc(counts))

df2
## # A tibble: 16 × 3
##     origins destinations counts
##       <chr>        <chr>  <int>
## 1  Portugal     Colorado      9
## 2   Romania   New Jersey      9
## 3   Romania    Minnesota      8
## 4   Nigeria     Colorado      7
## 5      Peru     Colorado      7
## 6      Peru    Minnesota      7
## 7  Portugal    Minnesota      7
## 8  Portugal        Texas      7
## 9      Peru   New Jersey      6
## 10  Romania        Texas      6
## 11  Nigeria    Minnesota      5
## 12  Nigeria   New Jersey      5
## 13     Peru        Texas      5
## 14  Romania     Colorado      5
## 15 Portugal   New Jersey      4
## 16  Nigeria        Texas      3

Now, we’ll want to plot the connections. While maps are often a first choice for visualizing geographic flows, they are not the only option. For example, with a little data formatting, the networkD3 package allows for network visualizations like the following:

library(networkD3)

name_vec <- c(unique(df2$origins), unique(df2$destinations))

nodes <- data.frame(name = name_vec, id = 0:7)

links <- df2 %>%
  left_join(nodes, by = c('origins' = 'name')) %>%
  rename(origin_id = id) %>%
  left_join(nodes, by = c('destinations' = 'name')) %>%
  rename(dest_id = id)


forceNetwork(Links = links, Nodes = nodes, Source = 'origin_id', Target = 'dest_id', 
             Value = 'counts', NodeID = 'name', Group = 'id', zoom = TRUE)

Use the scroll wheel on your mouse to zoom in; the width of the links are proportional to the size of the flow. A more appropriate visualization in this circumstance, however, might be a Sankey diagram, which is also available in the networkD3 package:

sankeyNetwork(Links = links, Nodes = nodes, Source = 'origin_id', Target = 'dest_id', 
              Value = 'counts', NodeID = 'name', fontSize = 16)

A similar representation is available in the parsetR package by Kenton Russell, available on GitHub.

library(parsetR) # devtools::install_github("timelyportfolio/parsetR")

parset(df2, dimensions = c('origins', 'destinations'), 
       value = htmlwidgets::JS("function(d){return d.counts}"), 
       tension = 0.5)

Now, let’s create a couple interactive flow maps. To do this, we need to have some sense of where the places are located in geographic space, requiring some spatial data; we’ll use the rnaturalearth package for this, available on GitHub.

library(rnaturalearth) # devtools::install_github('ropenscilabs/rnaturalearth')

countries <- ne_countries()

states <- ne_states(iso_a2 = 'US')

The states data have long/lat information already, but the countries data do not, so we’ll need to calculate it with the rgdal package.

library(rgdal)

countries$longitude <- coordinates(countries)[,1]

countries$latitude <- coordinates(countries)[,2]

countries_xy <- countries@data %>%
  select(admin, longitude, latitude)

states_xy <- states@data %>%
  select(name, longitude, latitude)

Now that we have the XY data, we can merge it to our pre-existing data frame.

df3 <- df2 %>%
  left_join(countries_xy, by = c('origins' = 'admin')) %>%
  left_join(states_xy, by = c('destinations' = 'name'))

df3$longitude.y <- as.numeric(as.character(df3$longitude.y))

df3$latitude.y <- as.numeric(as.character(df3$latitude.y))

head(df3)
## # A tibble: 6 × 7
##    origins destinations counts longitude.x latitude.x longitude.y
##      <chr>        <chr>  <int>       <dbl>      <dbl>       <dbl>
## 1 Portugal     Colorado      9   -8.055766  39.634050   -105.5430
## 2  Romania   New Jersey      9   24.943252  45.857101    -74.4653
## 3  Romania    Minnesota      8   24.943252  45.857101    -93.3640
## 4  Nigeria     Colorado      7    7.995128   9.548318   -105.5430
## 5     Peru     Colorado      7  -74.391806  -9.191563   -105.5430
## 6     Peru    Minnesota      7  -74.391806  -9.191563    -93.3640
## # ... with 1 more variables: latitude.y <dbl>

Looks good. Now, we can use the gcIntermediate function in the geosphere package to calculate great circles.

library(geosphere)

flows <- gcIntermediate(df3[,4:5], df3[,6:7], sp = TRUE, addStartEnd = TRUE)

flows$counts <- df3$counts

flows$origins <- df3$origins

flows$destinations <- df3$destinations

For interactive web maps in R, the leaflet package is a great option. It’ll allow for some interactive exploration of the data, such as the ability to turn on and off layers to see specific flows more clearly.

library(leaflet)
library(RColorBrewer)

hover <- paste0(flows$origins, " to ", 
                flows$destinations, ': ', 
                as.character(flows$counts))

pal <- colorFactor(brewer.pal(4, 'Set2'), flows$origins)

leaflet() %>%
  addProviderTiles('CartoDB.Positron') %>%
  addPolylines(data = flows, weight = ~counts, label = hover, 
               group = ~origins, color = ~pal(origins)) %>%
  addLayersControl(overlayGroups = unique(flows$origins), 
                   options = layersControlOptions(collapsed = FALSE))

The default Mercator projection of most web maps is not ideal for visualizing great circles, however, especially for longer distances. As such, you might want to try an alternative representation of the Earth, such as a three-dimensional globe. This can be accomplished withe the threejs package (available on GitHub), and doesn’t even require the great circle objects we created.

library(threejs) # devtools::install_github("bwlewis/rthreejs")

df4 <- arrange(df3, origins)

df4$colors <- rep(brewer.pal(4, 'Set2'), each = 4)

weights <- 1.5 * df4$counts

arcs <- data.frame(lat1 = df4$latitude.x, lon1 = df4$longitude.x, 
                   lat2 = df4$latitude.y, lon2 = df4$longitude.y)

globejs(arcsLwd = weights, arcs = arcs, arcsColor = df4$colors)

To leave a comment for the author, please follow the link and comment on their blog: Kyle Walker.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)