Site icon R-bloggers

How to: network animation with R and the iGraph package & Meaning in data viz

[This article was first published on SoMe Lab » r-project, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This article lists the steps I take to create a network animation in R, provides some example source code that you can copy and modify for your own work, and starts a discussion about programming and visualization as an interpretive approach in research. Before I start, take a look at this network animation created with R and the iGraph package. This animation is of a retweet network related to #BankTransferDay. Links (displayed as lines) are retweets, nodes (displayed as points) are user accounts. For each designated period of time (in this case, an hour), retweets are drawn and then fade out over 24 hours.

Animating in R isn’t new. Duncan Murdoch’s animation of a 3D rgl object was posted in 2009, and in May of this year, Ben Schmid did a very slick animation of ship’s log data for voyages from 1750-1850. Just this month David Smith animated the campaign stops of Obama and Romney as they criss-crossed the country. There are numerous other examples on the R-Bloggers site. What’s different here is that I’m intent on studying how networks change, and to do that it is helpful to visualize the process of change rather than simply the final output.

First, here is an overview of the algorithmic steps (but not all of the experimental steps) involved in animating a network using R and the iGraph package.

  1. Prepare your network data and create an iGraph graph object
  2. Run the layout command on the graph and save the coordinates
  3. In a loop
    1. Set appropriate transparency for active nodes and links
    2. Plot the network
  4. Gather up .pngs and make the movie

Here is the code, which I hope has sufficient comments to make it easy to follow my logic.  I consider myself a reasonably good R coder, but R is such a rich language and I come from a non-vector-based programming background, so any tips or feedback are appreciated. Note that the animation this code creates is different (embedded at the end) than the one in the intro. I did this because my example code is simpler, I’m not yet inclined to release my retweet network data, and having both as examples give you an idea of how you can modify the code for your own purposes.

 

?Download download.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# Author: Jeff Hemsley jhemsley at uw dot edu
# Created: Nov 20 2012
# 
# File generates random graph with random dates and creates set of pngs for animation
#Load igraph
library(igraph)
 
# igraph has many nifty ways to generate random graphs. :-)
start.nodes <- 2
total.nodes <- 500
g <- erdos.renyi.game(start.nodes, 1/2, directed=T)
g <- barabasi.game(total.nodes, start.graph=g, out.pref=T, directed=T, out.seq=rep(2, total.nodes - start.nodes))
 
# make a layout and set the x & y attributes of the graph vertices 
l <- layout.fruchterman.reingold(g)
V(g)$x <- l[,1]
V(g)$y <- l[,2]
 
 
# since I'm using a random graph, i want to use random dates for this example
start.date <- as.POSIXct(strptime('2012-01-01 07:00:00', '%Y-%m-%d %H:%M:%S'))
end.date <- as.POSIXct(strptime('2012-01-07 07:00:00', '%Y-%m-%d %H:%M:%S'))
possible.dates <- seq.POSIXt(start.date, end.date, by="hour")
num.time.steps <- length(possible.dates) # we use later for the loop.
 
# now we need to associate dates with links and I use sample with replace=T
num.edges <- ecount(g)
E(g)$date <- sample(possible.dates, num.edges, replace=T)
E(g)$width <- 2
 
# these are some initial edge and vertex settings. 
# Note that I set a value for red, green and blue between 0 and 255,
# an alpha, or transparency value of 0 making all objects transparent
E(g)$red <- 255
E(g)$green <- 140
E(g)$blue <- 0
E(g)$alpha <- 0
# then I give a default color just so the attribute exists for later.
E(g)$color <- "black"
 
V(g)$red <- 95
V(g)$green <- 158
V(g)$blue <- 160
V(g)$alpha <- 0
V(g)$color <- "black"
 
# season to taste
V(g)$size <- 5
V(g)$frame.color <- NA
V(g)$label <- ""
 
# in this example I am using a look back of 12 frames for the fade out
# so, over 12 movie frames the links and vertices get more and more
# transparent. 
look.back.default <- 12
alpha.vec.default <- round(seq(0, 255, length=look.back.default + 1),0)
alpha.vec.length <- length(alpha.vec.default)
 
# workhorse loop
for (time.step in 1:num.time.steps) {
 
  # look.back needs to be altered at the early part of the animation
  look.back <- time.step - look.back.default
  if (look.back < 0) {
    look.back <- 1
  }
 
  date.fade.index <- look.back:time.step
  date.fade.index.length <- length(date.fade.index)
 
  # we always want to set the newest edge/vertex alpha last and we 
  # we always want it to be opaque. But if look.back is greater than
  # available time steps we need to shorten the alpha vector
  alpha.vec <- alpha.vec.default
  if ((alpha.vec.length - date.fade.index.length) > 0) {
    alpha.vec <- alpha.vec[-(1:(alpha.vec.length - date.fade.index.length))]
  }
 
  # for each look.back time step we set alpha for edges/vertices
  # with those time stamps. Some time steps may have no links or
  # vertices, some have many. Do the newest last so that they don't
  # get down-graded if they show up more than once in the fade out 
  # period
  for (j in 1:length(date.fade.index)) {
    active.edges <- which(E(g)$date == possible.dates[date.fade.index[j]])
 
    if (length(active.edges) > 0) {
      E(g)[active.edges]$alpha <- alpha.vec[j]
      V(g)[from(E(g)[active.edges])]$alpha <- alpha.vec[j]
      V(g)[to(E(g)[active.edges])]$alpha <- alpha.vec[j]
    }
  }  
 
  # now make sure all edge/vertext colors are set with whatever their alphas are
  E(g)$color <- rgb(red=E(g)$red, green=E(g)$green
                    , blue=E(g)$blue, maxColorValue=255, alpha=E(g)$alpha)
  V(g)$color <- rgb(V(g)$red, V(g)$green, V(g)$blue, V(g)$alpha, maxColorValue=255)
 
  # file names should be the same except with an incremented number. 
  # many other R-animation authors use ffmpeg 
  out.file.name <- paste("/home/jeff/testplot/NetAnimation_", time.step, ".png", sep="")
  png(out.file.name, width=640, height=480)
  plot.igraph(g, edge.arrow.size=0, edge.arrow.width=0, edge.curved = .5, main="")
  dev.off()
}

Many other R-animation authors use ffmpeg to automate the process of turning the png’s into a movie file or they use the animation package. Either is ok. For both of my examples I used MS MovieMaker because it was convenient. If you use that code and convert the pngs to a movie, it looks something like this:

What does it mean?

Statisticians have long advised plotting your data as way to help you make sense of it. From an introductory text on statistics: “Our major problem is in organizing, summarizing, and describing these data- that is, making sense of the data” and “good descriptive statistics enable us to make sense of the data by reducing a large set of measurements to a few summary measures that provide a good, rough picture of the original measures” (Ott, R. L., and M. Longnecker, 1993).

Since I am interested in understanding how networks change over time, animations may be useful in comparing networks at different points in time as well as comparing the evolution of two different networks. In an email conversation with Marilyn Ostergren, a PhD candidate and information visualization expert at the University of Washington’s iSchool, she said that a network visualization, “presents the data in a form that allows you to use pattern recognition skills – probably mostly gestalt-theory-type impressions that help us to perceive relationships based on physical clustering and connecting lines. So if you see that two networks have similar clustering patterns while two others have different clustering patterns, then you could try to identify what characteristics lead to these patterns, and then create visualizations of new data sets known to embody those characteristics and see if they also show the same patterns.” Also, in a recent blog post Markham & Lindgren, put forth the idea of a Network Sensibility wherein through an iterative process of working with the data to create plots, researchers may gain insights about the relationships and complexity in the network that are not available by studying centrality scores or other network measurements alone. Discussing concept mapping, they say, “used systematically and iteratively, concept mapping can sponsor less linear or text-centric sensemaking. The practice of iteratively generating images, “functions as an organizational tool and might appear to focus on texts or objects [in boxes or circles], but it yields a nonlinear conceptual model that is, by design, emergent and dynamic” (p.12.).

In the first part of this post I presented the algorithmic steps and R code that I use to generate a network animation. What I haven’t done, but will do in a future post, is talk about coding – writing software – as an iterative analytic tool in understanding what’s in the data, which supports a qualitative interpretation of what’s being visualized. Of course this is tricky because the algorithm it self is a layer of interpretation of the data that must be justified. Finally, Marilyn Ostergren points our another challenge: “If you can show differences and change, that suggests something might be going on. But it doesn’t necessarily show that something significant or meaningful is going on.” I’m left with a sense that I can use visualizations as data, but I need to do more thinking about knowledge claims based on that data. Perhaps triangulation with other methods can support such claims and make them palatable to an academic community that is used to rich text or p-values.

I’m going to end this post with one more quote from Markham & Lindgren:

“The generative power of mapping as an iterative layered process of sensemaking might be found using other methods, but visualizations serve at least two functions: First, the activity of producing multiple renderings of the context surrounding a phenomenon destabilizes both the context and the phenomenon, an essential step toward shifting to more complex accounts of contemporary culture. Second, multiple layers of visualizations can provide a systematic trace of one’s movement through various analytical categories and interpretations. Whether or not one uses visually-oriented methods for thinking, the process, when woven into the findings as well as the analysis, highlights rather than hides the multiplicity of directions possible, offering one’s outcomes as a deliberate choice among many for what constitutes the research object” (p.11).

 

– jeff hemsley

Ott, R. L., and M. Longnecker. An Introduction to Statistical Methods and Data Analysis. Duxbury press Belmont, CA, 1993. See page 40.

To leave a comment for the author, please follow the link and comment on their blog: SoMe Lab » r-project.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.