Posted on September 15, 2013 by Vessy in R bloggers | 0 Comments
[This article was first published on Fun with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the third part of “how to quickly visualize networks directly from R” series, I’ll write about the hive plots and “HiveR” package. The concept of hive plots is fundamentally different from the Cytoscape and Gephi plots.
Cytoscape and Gephi use a number of layout algorithms to plot networks as node-edge diagrams in the Euclidean plane. The layout algorithms determine node (and edge) positions based on various criteria, e.g., the number of direct interacting partners, smallest number of edge crossings, or similar edge length between all nodes. Clearly, the resulting plots are sensitive to changes and even a small change in underlying topology can lead to a change in the final layout. For this reason, it is hard to assess how similar (or different) two networks are solely based on their resulting layouts/plots. Additionally, standard network layouts generally work well for visualization of small/medium size networks, while visualization of large network often results in the “hairball” network plots that lack identifiable structural patterns.
Conversely to standard network plots (i.e., layout algorithms), the goal of hive plots is to capture and expose both trends and patterns in network structure that arise from large number of nodes and edges, rather than solely representing network structure in the form of node-edge diagrams. Thus, in the hive plots individual nodes and edges are not as important as individual elements, but as parts of a system.
Hive plots map nodes onto radially distributed linear axes and edges between nodes are drawn as curved links that connect the axes. Nodes are assigned to axes and position along the axis (denoted as the radius) based on their qualitative or quantitative properties, e.g., network structure, node, edge annotation, or any other meaningful properties of the network. Thus, using hive plots, users can create their own rules for a mapping between the network properties of interest and layout. As such, hive plots give users the ability to assess network structure using network properties they are interested in, as well as the ability to compare two networks based on the selected properties.
To demonstrate how this works, I will use the same network I used to demonstrate network visualization in Cytoscape – the weighted network of characters’ coappearances in Victor Hugo’s novel “Les Miserables” (LesMiserables.txt). I will also use the same node and edge properties: the degree of a node, betweenness centrality of a node, Dice similarity of two nodes, and the coappearance weight. For more information, see Network visualization – part 1 and Network visualization – part 2).
Given a network in an edge list format (data from column 1 and column 2 correspond to the interacting pairs of nodes), we can use the “edge2HPD” command to create a hive object. For example, if the list of interactions is given in the data frame denoted as “dataSet.ext,” we can create a hive object as:
hive1 <- edge2HPD(edge_df = dataSet.ext)
This function will assign all nodes to a single axis. Additionally, all nodes will be assigned the same position along the axis (the same radius), the same color, and node size. If the data frame contained third column, e.g., weights, the "edge2HPD" function will also assign the values from that column to the corresponding edges. To adjust node radius, we will use the "mineHPD" function and
it "rad <- tot.edge.count" option. This option will assign to each node a radius that corresponds to its degree:
hive2 <- mineHPD(hive1, option = "rad <- tot.edge.count")
We'll also use the "mineHPD" function (and its the "axis <- source.man.sink" option) to assign nodes to different axes. The "axis <- source.man.sink" option assumes that the edges provided in the data frame represent directed edges, i.e., the first column in the data frame represents the "from" and the second column represents the "to" node. This option examines the nodes and their corresponding edges to determine if the node is a source (has only outgoing edges), sink (has only has incoming edges), or manager (has both types of edges). For now we will ignore the fact that our network is not directed and we'll use this function/option as follows:
hive3 <- mineHPD(hive2, option = "axis <- source.man.sink")
Hive plot requires that none of the edges starts and ends at the same node, not that any edges has zero length because the axis and radius of the start and end nodes are the same. We will use the "remove zero edge" option from the "mineHPD" function to remove any such edge (note that this will not influence the resulting plot).
hive4 <- mineHPD(hive3, option = "remove zero edge")
Finally, let's plot the hive plot using the "plotHive" function: plotHive(hive4, method = "abs", bkgnd = "white", axLabs = c("source", "hub", "sink"), axLab.pos = 1)
Figure 1A shows the resulting (default) plot. We can see that most nodes are either sources or manager nodes. Unfortunately, this does not mean too much for us, as our graph is undirected and the obtained visualization does not correspond/describe our data truthfully. We can try to customize the plot to see whether or not it'll highlight some of the real properties our data has. To do so, we use the option to directly access hive object elements that correspond to node color, node size, edge color, and edge weight: "hive4$nodes$color," "hive4$nodes$size, "hive4$edges$color," and "hive4$edges$weight," respectively. We assigned node color based on the node degree, node size based on node's betweenness centrality, edge color based on Dice similarity, and edge thickness based on the weight. Figure 1B-D shows the obtained results. The customization has brought out some patterns, but it still includes the "direction" bias.
From default to customized hive plot (edge2HPD version)
HiveR also allows users to create a hive object from the adjacency matrix.
Using the "igraph" package, we can create a graph that corresponds to the data frame we used above. We can specify that our graph is undirected. Next, we can extract an adjacency matrix from the graph. Given that all available HiveR functions assume that underlying graphs are directed, we will create only the upper triangle of the adjacency matrix. Finally, we will use the "adj2HPD" function to create a hive object
Repeating the same steps as above (for "edge2HPD"), we created the following hive plot:
Customized hive plot (adj2HPD version)
We can see that it is very similar to the plot created with edge2HPD. There are more interactions between source, manager, and sink nodes than before, but it is still hard to say what that information/observation means for the undirected graph as ours.
Trying to overcome this problem, I wrote a few additional options for the "mineHPD" function (see "mod.mineHPD" function).
For example, I wanted to assign low connected nodes to one axis, medium connected nodes to another axis, and highly connected nodes to the third axis. I used used the "axis <- deg_five_ten_more" function to do so. I decided that at this point I am not interested in node radius, so I assign random radius value to all nodes ("rad <- random" option). The resulting plot is in the Figure 3A. This plot looks similar to the previous ones. However, from this plot we can say for sure that out data contains a large number of highly connected nodes that interact with each other. To further evaluate this observation, I used the "axis <- split" option.This function splits each of the 3 axes into 2 new axes (thus, resulting in 6 axes) and provides the better visualization of the interactions between the nodes on the same axis (in the original plot). Indeed, the Figure 3B shows how the "strength" of interactions between the highly connected nodes.
Next, I wanted to create a plot in which node radius to corresponds node's betweenness centrality. To do so, I used the option "rad <- userDefined." This option requires the information about the source (data frame) where that contains information about the nodes and their corresponding (betweenness centrality) values that will be used for the radius. Similarly as before, I wanted to assign nodes to axes based on their degree. In this case, I wanted nodes with degree 1 to be assigned to axis 1, nodes with degree 2 to be assigned to axis 2, and all other nodes to be assigned to axis 3. To do so, I used the "deg_one_two_more" function. The resulting plot and its "split" version are shown in Figure 3C and Figure 3D.
[caption id="attachment_200" align="alignnone" width="300"] Customized hive plot (new customization functions)[/caption]
These new functionalities can help us identify some additional patterns in the underlying interactions, especially in undirected ones. However, adding a new functionality every time we want to create a hive plot in slightly different is not necessarily the optimal way, as the ways to define node radius or axis assignment are unlimited (well, not exactly, but if properly configured - almost unlimited). To address this issue, I expanded the "edge2HPD" function (see "mod.edge2HPD" function) to include the options for automatic node color, size radius, and axis assignment, as well as the automatic assignment of edge color and weight.
Using this function we can create a hive plot in which we assigned nodes to axes randomly (Figure 4A). You can notice how the structural patterns, we observed previously, are lost in this plot. This directly demonstrate the significance of the appropriate node axis assignment. To test this, we clustered nodes based on the Dice similarity (using hierarchical clustering). We "cut" the resulting tree in the way that results in six non-overlapping clusters. We assigned nodes from each of the six cluster to six axes. Figure 4B captures the relationship between the interactions within and across clusters.
Customized hive plot (new edge2HPD function)
Here are the additional functions and the complete code used for create hive plots:
Related
To leave a comment for the author, please follow the link and comment on their blog: Fun with R.