Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the third part of “how to quickly visualize networks directly from R” series, I’ll write about the hive plots and “HiveR” package. The concept of hive plots is fundamentally different from the Cytoscape and Gephi plots.
Cytoscape and Gephi use a number of layout algorithms to plot networks as node-edge diagrams in the Euclidean plane. The layout algorithms determine node (and edge) positions based on various criteria, e.g., the number of direct interacting partners, smallest number of edge crossings, or similar edge length between all nodes. Clearly, the resulting plots are sensitive to changes and even a small change in underlying topology can lead to a change in the final layout. For this reason, it is hard to assess how similar (or different) two networks are solely based on their resulting layouts/plots. Additionally, standard network layouts generally work well for visualization of small/medium size networks, while visualization of large network often results in the “hairball” network plots that lack identifiable structural patterns.
Conversely to standard network plots (i.e., layout algorithms), the goal of hive plots is to capture and expose both trends and patterns in network structure that arise from large number of nodes and edges, rather than solely representing network structure in the form of node-edge diagrams. Thus, in the hive plots individual nodes and edges are not as important as individual elements, but as parts of a system.
Hive plots map nodes onto radially distributed linear axes and edges between nodes are drawn as curved links that connect the axes. Nodes are assigned to axes and position along the axis (denoted as the radius) based on their qualitative or quantitative properties, e.g., network structure, node, edge annotation, or any other meaningful properties of the network. Thus, using hive plots, users can create their own rules for a mapping between the network properties of interest and layout. As such, hive plots give users the ability to assess network structure using network properties they are interested in, as well as the ability to compare two networks based on the selected properties.
To demonstrate how this works, I will use the same network I used to demonstrate network visualization in Cytoscape – the weighted network of characters’ coappearances in Victor Hugo’s novel “Les Miserables” (LesMiserables.txt). I will also use the same node and edge properties: the degree of a node, betweenness centrality of a node, Dice similarity of two nodes, and the coappearance weight. For more information, see Network visualization – part 1 and Network visualization – part 2).
Given a network in an edge list format (data from column 1 and column 2 correspond to the interacting pairs of nodes), we can use the “edge2HPD” command to create a hive object. For example, if the list of interactions is given in the data frame denoted as “dataSet.ext,” we can create a hive object as:
hive1 < - edge2HPD(edge_df = dataSet.ext)
This function will assign all nodes to a single axis. Additionally, all nodes will be assigned the same position along the axis (the same radius), the same color, and node size. If the data frame contained third column, e.g., weights, the "edge2HPD" function will also assign the values from that column to the corresponding edges. To adjust node radius, we will use the "mineHPD" function and
it "rad hive2 < - mineHPD(hive1, option = "rad < - tot.edge.count")
We'll also use the "mineHPD" function (and its the "axis hive3 < - mineHPD(hive2, option = "axis < - source.man.sink")
Hive plot requires that none of the edges starts and ends at the same node, not that any edges has zero length because the axis and radius of the start and end nodes are the same. We will use the "remove zero edge" option from the "mineHPD" function to remove any such edge (note that this will not influence the resulting plot).
hive4 < - mineHPD(hive3, option = "remove zero edge")
Finally, let's plot the hive plot using the "plotHive" function:
plotHive(hive4, method = "abs", bkgnd = "white", axLabs = c("source", "hub", "sink"), axLab.pos = 1)
Figure 1A shows the resulting (default) plot. We can see that most nodes are either sources or manager nodes. Unfortunately, this does not mean too much for us, as our graph is undirected and the obtained visualization does not correspond/describe our data truthfully. We can try to customize the plot to see whether or not it'll highlight some of the real properties our data has. To do so, we use the option to directly access hive object elements that correspond to node color, node size, edge color, and edge weight: "hive4$nodes$color," "hive4$nodes$size, "hive4$edges$color," and "hive4$edges$weight," respectively. We assigned node color based on the node degree, node size based on node's betweenness centrality, edge color based on Dice similarity, and edge thickness based on the weight. Figure 1B-D shows the obtained results. The customization has brought out some patterns, but it still includes the "direction" bias.
HiveR also allows users to create a hive object from the adjacency matrix.
Using the "igraph" package, we can create a graph that corresponds to the data frame we used above. We can specify that our graph is undirected. Next, we can extract an adjacency matrix from the graph. Given that all available HiveR functions assume that underlying graphs are directed, we will create only the upper triangle of the adjacency matrix. Finally, we will use the "adj2HPD" function to create a hive object
gD <- simplify(graph.data.frame(dataSet.ext, directed=FALSE))
gAdj <- get.adjacency(gD, type = "upper", edges = FALSE, names = TRUE, sparse = FALSE)
hive1 <- adj2HPD(gAdj, type = "2D")
Repeating the same steps as above (for "edge2HPD"), we created the following hive plot:
We can see that it is very similar to the plot created with edge2HPD. There are more interactions between source, manager, and sink nodes than before, but it is still hard to say what that information/observation means for the undirected graph as ours.
Trying to overcome this problem, I wrote a few additional options for the "mineHPD" function (see "mod.mineHPD" function).
For example, I wanted to assign low connected nodes to one axis, medium connected nodes to another axis, and highly connected nodes to the third axis. I used used the "axis
These new functionalities can help us identify some additional patterns in the underlying interactions, especially in undirected ones. However, adding a new functionality every time we want to create a hive plot in slightly different is not necessarily the optimal way, as the ways to define node radius or axis assignment are unlimited (well, not exactly, but if properly configured - almost unlimited). To address this issue, I expanded the "edge2HPD" function (see "mod.edge2HPD" function) to include the options for automatic node color, size radius, and axis assignment, as well as the automatic assignment of edge color and weight.
Using this function we can create a hive plot in which we assigned nodes to axes randomly (Figure 4A). You can notice how the structural patterns, we observed previously, are lost in this plot. This directly demonstrate the significance of the appropriate node axis assignment. To test this, we clustered nodes based on the Dice similarity (using hierarchical clustering). We "cut" the resulting tree in the way that results in six non-overlapping clusters. We assigned nodes from each of the six cluster to six axes. Figure 4B captures the relationship between the interactions within and across clusters.
Here are the additional functions and the complete code used for create hive plots:
library("igraph") library("plyr") library("HiveR") library("RColorBrewer") ############################################################################################ rm(list = ls()) dataSet <- read.table("lesmis.txt", header = FALSE, sep = "t") ############################################################################################ # Create a graph. Use simplify to ensure that there are no duplicated edges or self loops gD <- simplify(graph.data.frame(dataSet, directed=FALSE)) # Print number of nodes and edges # vcount(gD) # ecount(gD) # Calculate some node properties and node similarities that will be used to illustrate # different plotting abilities # Calculate degree for all nodes degAll <- degree(gD, v = V(gD), mode = "all") # Calculate betweenness for all nodes betAll <- betweenness(gD, v = V(gD), directed = FALSE) / (((vcount(gD) - 1) * (vcount(gD)-2)) / 2) betAll.norm <- (betAll - min(betAll))/(max(betAll) - min(betAll)) node.list <- data.frame(name = V(gD)$name, degree = degAll, betw = betAll.norm) # Calculate Dice similarities between all pairs of nodes dsAll <- similarity.dice(gD, vids = V(gD), mode = "all") # Calculate edge weight based on the node similarity F1 <- function(x) {data.frame(V4 = dsAll[which(V(gD)$name == as.character(x$V1)), which(V(gD)$name == as.character(x$V2))])} dataSet.ext <- ddply(dataSet, .variables=c("V1", "V2", "V3"), function(x) data.frame(F1(x))) rm(degAll, betAll, betAll.norm, F1) ############################################################################################ #Determine node/edge color based on the properties # Calculate node size # We'll interpolate node size based on the node betweenness centrality, using the "approx" function # And we will assign a node size for each node based on its betweenness centrality approxVals <- approx(c(0.5, 1.5), n = length(unique(node.list$bet))) nodes_size <- sapply(node.list$bet, function(x) approxVals$y[which(sort(unique(node.list$bet)) == x)]) node.list <- cbind(node.list, size = nodes_size) rm(approxVals, nodes_size) # Define node color # We'll interpolate node colors based on the node degree using the "colorRampPalette" function from the "grDevices" library library("grDevices") # This function returns a function corresponding to a collor palete of "bias" number of elements F2 <- colorRampPalette(c("#F5DEB3", "#FF0000"), bias = length(unique(node.list$degree)), space = "rgb", interpolate = "linear") # Now we'll create a color for each degree colCodes <- F2(length(unique(node.list$degree))) # And we will assign a color for each node based on its degree nodes_col <- sapply(node.list$degree, function(x) colCodes[which(sort(unique(node.list$degree)) == x)]) node.list <- cbind(node.list, color = nodes_col) rm(F2, colCodes, nodes_col) # Assign visual attributes to edges using the same approach as we did for nodes F2 <- colorRampPalette(c("#FFFF00", "#006400"), bias = length(unique(dataSet.ext$V4)), space = "rgb", interpolate = "linear") colCodes <- F2(length(unique(dataSet.ext$V4))) edges_col <- sapply(dataSet.ext$V4, function(x) colCodes[which(sort(unique(dataSet.ext$V4)) == x)]) dataSet.ext <- cbind(dataSet.ext, color = edges_col) rm(F2, colCodes, edges_col) ############################################################################################ # Assign nodes to axes # Randomly nodeAxis <- sample(3, nrow(node.list), replace = TRUE ) node.list <- cbind(node.list, axis = nodeAxis) rm(nodeAxis) ############################################################################################ #Create a hive plot source("mod.edge2HPD.R") hive1 <- mod.edge2HPD(edge_df = dataSet.ext[, 1:2], edge.weight = dataSet.ext[, 3], edge.color = dataSet.ext[, 5], node.color = node.list[,c("name", "color")], node.size = node.list[,c("name", "size")], node.radius = node.list[,c("name", "degree")], node.axis = node.list[,c("name", "axis")]) #sumHPD(hive1) hive2 <- mineHPD(hive1, option = "remove zero edge") plotHive(hive2, method = "abs", bkgnd = "white", axLab.pos = 1) ######################################## # Based on hierarchical cluestering d <- dist(dsAll) hc <- hclust(d) #plot(hc) nodeAxis <- cutree(hc, k = 6) node.list <- cbind(node.list, axisCl = nodeAxis) rm(nodeAxis) hive1 <- mod.edge2HPD(edge_df = dataSet.ext[, 1:2], edge.weight = dataSet.ext[, 3], edge.color = dataSet.ext[, 5], node.color = node.list[,c("name", "color")], node.size = node.list[,c("name", "size")], node.radius = node.list[,c("name", "degree")], node.axis = node.list[,c("name", "axisCl")]) #sumHPD(hive1) hive2 <- mineHPD(hive1, option = "remove zero edge") plotHive(hive2, method = "abs", bkgnd = "white", axLab.pos = 1) mod.edge2HPD <- function(edge_df = NULL, unique.rows = TRUE, axis.cols = NULL, type = "2D", desc = NULL, edge.weight = NULL, edge.color = NULL, node.color = NULL, node.size = NULL, node.radius = NULL, node.axis = NULL) { #edge.weight - a list corresponding to edge weights (same order as in edge_df) #edge.color - a lis corresponding to edge colors (same order as in edge_df) #node.color - a data frame consisting of two columns: column 1 - node labels, column 2 - node color #node.size - a data frame consisting of two columns: column 1 - node labels, column 2 - node size #node.radius - a data frame consisting of two columns: column 1 - node labels, column 2 - node radius #node.axis - a data frame consisting of two columns: column 1 - node labels, column 2 - node axis if (is.null(edge_df)){ stop("No edge data provided") } if (!is.data.frame(edge_df)){ stop("edge_df is not a data frame") } if (unique.rows) { nr.old <- nrow(edge_df) edge_df <- unique(edge_df) if (nr.old > nrow(edge_df)) cat("nt", nr.old - nrow(edge_df), "non-unique data-frame rows removed!nn") } # Get node labels lab1 <- as.character(unlist(edge_df[, 1])) lab2 <- as.character(unlist(edge_df[, 2])) # Get number of unique nodes nn <- length(unique(c(lab1, lab2))) # Define node ID id <- 1:nn # Define node label label <- unique(c(lab1, lab2)) # Create a data frame for node attributes node.attributes <- data.frame(id, label) #################################################### # Node size definition if (!is.null(node.size)) { if (is.numeric(node.size[, 2]) | is.integer(node.size[, 2])) { nSize <- c() for (i in 1:length(label)) { indx <- which(as.character(node.size[,1]) == label[i]) if (length(indx[1]) != 0) nSize = c(nSize, node.size[indx[1],2]) else { msg <- paste("No size data provided for the node ", nodes$id[n], ". Value 1 will be assigned to this node!", sep = "") warning(msg) nSize = c(nSize, 1) } } node.attributes <- cbind(node.attributes, size = nSize) rm(i, nSize, indx) }#is.numeric else{ stop("Node size is not numeric or integer.") } }#is.null if (is.null(node.size)) { warning("No data provided for the node size. All nodes will be assigned size 1!") node.attributes <- cbind(node.attributes, size = rep(1, nn)) } #################################################### # Node color definition if (!is.null(node.color)) { nCol <- c() for (i in 1:length(label)) { indx <- which(as.character(node.color[,1]) == label[i]) if (length(indx[1]) != 0) nCol = c(nCol, as.character(node.color[indx[1],2])) else { msg <- paste("No color data provided for the node ", nodes$id[n], ". Black color will be assigned to this node!", sep = "") warning(msg) nCol = c(nCol, "black") } } node.attributes <- cbind(node.attributes, color = nCol) rm(i, nCol, indx) }#is.null if (is.null(node.color)) { warning("No data provided for the node color. All nodes will be colored black!") node.attributes <- cbind(node.attributes, color = as.character(rep("black", nn))) } #################################################### # Node radius definition if (!is.null(node.radius)) { if (is.numeric(node.radius[, 2]) | is.integer(node.radius[, 2])) { nSize <- c() for (i in 1:length(label)) { indx <- which(as.character(node.radius[,1]) == label[i]) if (length(indx[1]) != 0) nSize = c(nSize, node.radius[indx[1],2]) else { msg <- paste("No raidus data provided for the node ", nodes$id[n], ". Random values will be assigned!", sep = "") warning(msg) nSize = c(nSize, sample(nn, 1)) } } node.attributes <- cbind(node.attributes, radius = nSize) rm(i, nSize, indx) }#is.numeric else{ stop("Node raidus is not integer.") } }#is.null if (is.null(node.radius)) { warning("No data provided for the node radius. All nodes will be assigned random radius values") node.attributes <- cbind(node.attributes, radius = sample(nn, nn)) } #################################################### # Node axis definition if (!is.null(node.axis)) { if (is.integer(node.axis[, 2])) { nSize <- c() for (i in 1:length(label)) { indx <- which(as.character(node.axis[,1]) == label[i]) if (length(indx[1]) != 0) nSize = c(nSize, node.axis[indx[1],2]) else { msg <- paste("No axis data provided for the node ", nodes$id[n], ". This node will be assigned to axis 1!", sep = "") warning(msg) nSize = c(nSize, 1) } } node.attributes <- cbind(node.attributes, axis = nSize) rm(i, nSize, indx) }#is.integer else{ stop("Node axis is not integer.") } }#is.null if (is.null(node.axis)) { warning("No data provided for the node axis. All nodes will be assigned to axis 1") node.attributes <- cbind(node.attributes, axis = rep(1, nn)) } ###################################################### # Create HPD object HPD <- list() # Define node attributes HPD$nodes$id <- as.integer(node.attributes$id) HPD$nodes$lab <- as.character(node.attributes$label) HPD$nodes$axis <- as.integer(node.attributes$axis) HPD$nodes$radius <- as.numeric(node.attributes$radius) HPD$nodes$size <- as.numeric(node.attributes$size) HPD$nodes$color <- as.character(node.attributes$color) #################################################### # Get number of edges ne <- nrow(edge_df) #################################################### # Edge weight definition if (!(is.null(edge.weight))) { if (length(edge.weight) != nrow(edge_df)) stop("Edge weights are not provided for all edges!") if (is.numeric(edge.weight) | is.integer(edge.weight)) edge_df <- cbind(edge_df, weight = edge.weight) else stop("Edge weight column is not numeric or integer.") } if (is.null(edge.weight)) { warning("No edge weight provided Setting default edge weight to 1") edge_df <- cbind(edge_df, weight = rep(1, ne)) } #################################################### # Edge color definition if (!(is.null(edge.color))) { if (length(edge.color) != nrow(edge_df)) stop("Edge colors are not provided for all edges!") else edge_df <- cbind(edge_df, color = as.character(edge.color)) } if (is.null(edge.color)) { warning("No edge color provided. Setting default edge color to gray") edge_df <- cbind(edge_df, color = rep("gray", ne)) } #################################################### # Set up edge list # Merge by default sorts things and changes the order of edges, so edge list has to stay paired edge.hlp <- merge(edge_df, node.attributes[, 1:2], by.x = 1, by.y = "label") edge <- merge(edge.hlp, node.attributes[1:2], by.x = 2, by.y = "label") HPD$edges$id1 <- as.integer(edge$id.x) HPD$edges$id2 <- as.integer(edge$id.y) HPD$edges$weight <- as.numeric(edge$weight) HPD$edges$color <- as.character(edge$color) HPD$nodes <- as.data.frame(HPD$nodes) HPD$edges <- as.data.frame(HPD$edges) # Add description if (is.null(desc)) { desc <- "No description provided" } HPD$desc <- desc # Define axis columns if (is.null(axis.cols)){ axis.cols <- brewer.pal(length(unique(HPD$nodes$axis)), "Set1") } HPD$axis.cols <- axis.cols HPD$nodes$axis <- as.integer(HPD$nodes$axis) HPD$nodes$size <- as.numeric(HPD$nodes$size) HPD$nodes$color <- as.character(HPD$nodes$color) HPD$nodes$lab <- as.character(HPD$nodes$lab) HPD$nodes$radius <- as.numeric(HPD$nodes$radius) HPD$nodes$id <- as.integer(HPD$nodes$id) HPD$edges$id1 <- as.integer(HPD$edges$id1) HPD$edges$id2 <- as.integer(HPD$edges$id2) HPD$edges$weight <- as.numeric(HPD$edges$weight) HPD$edges$color <- as.character(HPD$edges$color) HPD$type <- type class(HPD) <- "HivePlotData" # Check HPD object chkHPD(HPD) return (HPD) } mod.mineHPD <- function(HPD, option = "", radData = NULL) { edges <- HPD$edges nodes <- HPD$nodes nn <- length(nodes$id) ### ++++++++++++++++++++++++++++++++++++++++++++++++++++ ### if (option == "axis <- source.man.sink") { # A change that allows this function to be used for undirected graphs # Now all nodes will be assigned to an axis done <- FALSE # a check to make sure all nodes get an axis for (n in 1:nn) { id1 <- which(n ==edges$id1) id2 <- which(n ==edges$id2) if ((length(id1) == 0) & (length(id2) > 0 )) { nodes$axis[n] <- 2 done <- TRUE next } # these are sinks, as they only receive an edge # note that set operations below drop duplicate values #Change 1 starts here if (length(id1) > 0) { if (length(id2) == 0) { nodes$axis[n] <- 1 done <- TRUE next } else { #Change 1 ends here common <- union(id1, id2) source <- setdiff(id1, common) if (length(source) == 1) { nodes$axis[n] <- 1 done <- TRUE next } # these are sources if (length(common) >= 1) { nodes$axis[n] <- 3 done <- TRUE next } # these are managers } } if (!done) { msg <- paste("node ", nodes$id[n], " was not assigned to an axis", sep = "") warning(msg) } # alert the user there was a problem } # end of loop inspecting nodes nodes$axis <- as.integer(nodes$axis) } ##### end of option == "axis <- source.man.sink ### ++++++++++++++++++++++++++++++++++++++++++++++++++++ ### if (option == "rad <- random") { # This option assigns a random radius value to a node for (n in 1:nn) nodes$radius[n] <- sample(1:nn, 1) } ##### end of option == "rad <- random" ### ++++++++++++++++++++++++++++++++++++++++++++++++++++ ### if (option == "rad <- userDefined") { # This option assigns a radius value to a node # based upon user specified values. if (is.null(radData)){ stop("No edge data provided") } if (length(intersect(as.character(radData[,1]), as.character(nodes$lab))) == 0){ stop("Provided data does not contain correct node labels") } for (n in 1:nn) { indexHlp <- which(as.character(radData[,1]) == nodes$lab[n]) if (length(indexHlp) != 0) nodes$radius[n] <- radData[indexHlp[1], 2] else { msg <- paste("No data provided for the node ", nodes$id[n], ". Value 1 will be assigned to this node!", sep = "") warning(msg) nodes$radius[n] <- 1 } } } ##### end of option == "rad <- userDefined" ### ++++++++++++++++++++++++++++++++++++++++++++++++++++ ### if (option == "axis <- deg_one_two_more") { # This option assigns a node to an axis # based upon whether its degree is 1, 2, or greater than two # # degree 1 = axis 1, degree 2 = axis 2, degree >2 = axis3 done <- FALSE # a check to make sure all nodes get an axis for (n in 1:nn) { id1 <- which(n ==edges$id1) id2 <- which(n ==edges$id2) if ((length(id1) + length(id2)) == 1) { nodes$axis[n] <- 1 done <- TRUE next } if ((length(id1) + length(id2)) == 2) { nodes$axis[n] <- 2 done <- TRUE next } if ((length(id1) + length(id2)) > 2) { nodes$axis[n] <- 3 done <- TRUE next } if (!done) { msg <- paste("node ", nodes$id[n], " was not assigned to an axis", sep = "") warning(msg) } # alert the user there was a problem } # end of loop inspecting nodes nodes$axis <- as.integer(nodes$axis) } ##### end of option == "axis <- deg_1_2_more ### ++++++++++++++++++++++++++++++++++++++++++++++++++++ ### if (option == "axis <- deg_five_ten_more") { # This option assigns a node to an axis # based upon whether its degree is <=5, 6-10, or greater than 10 # # degree <=5 = axis 1, degree between 6 and 10 = axis 2, degree >10 = axis32 done <- FALSE # a check to make sure all nodes get an axis for (n in 1:nn) { id1 <- which(n ==edges$id1) id2 <- which(n ==edges$id2) if ((length(id1) + length(id2)) <= 5) { nodes$axis[n] <- 1 done <- TRUE next } if (((length(id1) + length(id2)) > 5) & ((length(id1) + length(id2)) <= 10)) { nodes$axis[n] <- 2 done <- TRUE next } if ((length(id1) + length(id2)) > 10) { nodes$axis[n] <- 3 done <- TRUE next } if (!done) { msg <- paste("node ", nodes$id[n], " was not assigned to an axis", sep = "") warning(msg) } # alert the user there was a problem } # end of loop inspecting nodes nodes$axis <- as.integer(nodes$axis) } ##### end of option == "axis <- deg_five_ten_more" ### ++++++++++++++++++++++++++++++++++++++++++++++++++++ ### if (option == "remove axis edge") { # This option removes edges which start and end on the same axis # It re-uses code from sumHPD # Create a list of edges to be drawn n1.lab <- n1.rad <- n2.lab <- n2.rad <- n1.ax <- n2.ax <- c() for (n in 1:(length(HPD$edges$id1))) { i1 <- which(HPD$edges$id1[n] == HPD$nodes$id) i2 <- which(HPD$edges$id2[n] == HPD$nodes$id) n1.lab <- c(n1.lab, HPD$nodes$lab[i1]) n2.lab <- c(n2.lab, HPD$nodes$lab[i2]) n1.rad <- c(n1.rad, HPD$nodes$radius[i1]) n2.rad <- c(n2.rad, HPD$nodes$radius[i2]) n1.ax <- c(n1.ax, HPD$nodes$axis[i1]) n2.ax <- c(n2.ax, HPD$nodes$axis[i2]) } fd <- data.frame( n1.id = HPD$edges$id1, n1.ax, n1.lab, n1.rad, n2.id = HPD$edges$id2, n2.ax, n2.lab, n2.rad, e.wt = HPD$edges$weight, e.col = HPD$edges$color) prob <- which(fd$n1.ax == fd$n2.ax) if (length(prob) == 0) cat("nt No edges were found that start and end on the same axisn") if (length(prob) > 0) { edges <- edges[-prob,] cat("nt", length(prob), "edges that start and end on the same axis were removedn") } } ##### end of option == "remove axis edge" ### ++++++++++++++++++++++++++++++++++++++++++++++++++++ ### if (option == "axis <- split") { # This option splits all axes into 2 new axes # It can be used to address the "edge on the same axis" issue # This option may increase the number of nodes - a single node from the parent axis may appear on 2 "children" axes nodesNew <- nodes nodesOld <- nodes nAxes <- unique(nodes$axis) numAxes <- length(nAxes) #Renumerate axes for (i in numAxes:1) nodesOld[which(nodesOld$axis == nAxes[i]), "axis"] <- as.integer(2*nAxes[i] - 1) #Duplicate nodes #Renumerate axes for (i in numAxes:1) nodesNew[which(nodesNew$axis == nAxes[i]), "axis"] <- as.integer(2*nAxes[i]) #Re-numerate node ids nodesNew$id <- nodesNew$id + nn #Duplicated set of nodes with correct axis and node ids nodes <- rbind(nodesOld, nodesNew) rm(nodesOld, nodesNew) #Now create duplicated set of edges and re-numerate node ids for interactions edgesNew1 <- edges edgesNew1$id1 <- edgesNew1$id1 + nn edgesNew1$id2 <- edgesNew1$id2 + nn edgesNew2 <- edges edgesNew2$id1 <- edgesNew2$id1 + nn edgesNew3 <- edges edgesNew3$id2 <- edgesNew3$id2 + nn edges <- rbind(edges, edgesNew1, edgesNew2, edgesNew3) nodesAxis <- nodes[, c("id", "axis")] edgesHlp <- merge(edges, nodesAxis, by.x = "id1", by.y = "id") edges <- merge(edgesHlp, nodesAxis, by.x = "id2", by.y = "id") edgesOK <- edges[((edges$axis.x == 1) & (edges$axis.y == 2*numAxes)) | ((edges$axis.x == 2*numAxes) & (edges$axis.y == 1)), ] edgesHlp <- edgesOK if (numAxes > 1) for (i in 1:(numAxes - 1)) { edgesOK <- edges[((edges$axis.x == 2*i) & (edges$axis.y == (2*i + 1))) | ((edges$axis.x == (2*i + 1)) & (edges$axis.y == 2*i)), ] edgesHlp <- rbind(edgesHlp, edgesOK) } for (i in 1:numAxes) { edgesOK <- edges[((edges$axis.x == (2*i - 1)) & (edges$axis.y == 2*i)) | ((edges$axis.x == 2*i) & (edges$axis.y == (2*i - 1))), ] edgesHlp <- rbind(edgesHlp, edgesOK) } edges <- edgesHlp[, 1:4] unique.ids <- unique(c(edges$id1, edges$id2)) nodes <- nodes[nodes$id %in% unique.ids, ] # Check if the new number of axes is 2 times larger than old one # if not, we need to adjust axis numbers nodesAxis.new <- sort(unique(nodes$axis)) if(length(nodesAxis.new) != 2*numAxes) for (i in 1:length(nodesAxis.new)) if (i != nodesAxis.new[i]){ nodes[which(nodes$axis == nodesAxis.new[i]), "axis"] <- i } } ##### end of option == "axis <- split" ### ++++++++++++++++++++++++++++++++++++++++++++++++++++ ### # Final assembly and checking... HPD$edges <- edges HPD$nodes <- nodes chkHPD(HPD) HPD } library("igraph") library("plyr") library("HiveR") library("RColorBrewer") ############################################################################################ rm(list = ls()) dataSet <- read.table("lesmis.txt", header = FALSE, sep = "t") ############################################################################################ # Create a graph. Use simplify to ensure that there are no duplicated edges or self loops gD <- simplify(graph.data.frame(dataSet, directed=FALSE)) # Print number of nodes and edges # vcount(gD) # ecount(gD) # Calculate some node properties and node similarities that will be used to illustrate # different plotting abilities # Calculate degree for all nodes degAll <- degree(gD, v = V(gD), mode = "all") # Calculate betweenness for all nodes betAll <- betweenness(gD, v = V(gD), directed = FALSE) / (((vcount(gD) - 1) * (vcount(gD)-2)) / 2) betAll.norm <- (betAll - min(betAll))/(max(betAll) - min(betAll)) gD <- set.vertex.attribute(gD, "degree", index = V(gD), value = degAll) gD <- set.vertex.attribute(gD, "betweenness", index = V(gD), value = betAll.norm) # Check the attributes # summary(gD) gD <- set.edge.attribute(gD, "weight", index = E(gD), value = 0) gD <- set.edge.attribute(gD, "similarity", index = E(gD), value = 0) # Calculate Dice similarities between all pairs of nodes dsAll <- similarity.dice(gD, vids = V(gD), mode = "all") # Calculate edge weight based on the node similarity F1 <- function(x) {data.frame(V4 = dsAll[which(V(gD)$name == as.character(x$V1)), which(V(gD)$name == as.character(x$V2))])} dataSet.ext <- ddply(dataSet, .variables=c("V1", "V2", "V3"), function(x) data.frame(F1(x))) for (i in 1:nrow(dataSet.ext)) { E(gD)[as.character(dataSet.ext$V1) %--% as.character(dataSet.ext$V2)]$weight <- as.numeric(dataSet.ext$V3) E(gD)[as.character(dataSet.ext$V1) %--% as.character(dataSet.ext$V2)]$similarity <- as.numeric(dataSet.ext$V4) } rm(degAll, betAll, betAll.norm, F1, dsAll, i) ############################################################################################ #Determine node/edge color based on the properties # Calculate node size # We'll interpolate node size based on the node betweenness centrality, using the "approx" function # And we will assign a node size for each node based on its betweenness centrality approxVals <- approx(c(0.5, 1.5), n = length(unique(V(gD)$betweenness))) nodes_size <- sapply(V(gD)$betweenness, function(x) approxVals$y[which(sort(unique(V(gD)$betweenness)) == x)]) rm(approxVals) # Define node color # We'll interpolate node colors based on the node degree using the "colorRampPalette" function from the "grDevices" library library("grDevices") # This function returns a function corresponding to a collor palete of "bias" number of elements F2 <- colorRampPalette(c("#F5DEB3", "#FF0000"), bias = length(unique(V(gD)$degree)), space = "rgb", interpolate = "linear") # Now we'll create a color for each degree colCodes <- F2(length(unique(V(gD)$degree))) # And we will assign a color for each node based on its degree nodes_col <- sapply(V(gD)$degree, function(x) colCodes[which(sort(unique(V(gD)$degree)) == x)]) rm(F2, colCodes) # Assign visual attributes to edges using the same approach as we did for nodes F2 <- colorRampPalette(c("#FFFF00", "#006400"), bias = length(unique(E(gD)$similarity)), space = "rgb", interpolate = "linear") colCodes <- F2(length(unique(E(gD)$similarity))) edges_col <- sapply(E(gD)$similarity, function(x) colCodes[which(sort(unique(E(gD)$similarity)) == x)]) rm(F2, colCodes) ############################################################################################ # Now the new (HiveR) part # Create a hive plot from the data frame hive1 <- edge2HPD(edge_df = dataSet.ext) #sumHPD(hive1) # Assign nodes to a radius based on their degree (number of edges they are touching) hive2 <- mineHPD(hive1, option = "rad <- tot.edge.count") # Assign nodes to axes based on their position in the edge list # (this function assumes direct graphs, so it considers the first column to be a source and second column to be a sink ) hive3 <- mineHPD(hive2, option = "axis <- source.man.sink") # Removing zero edges for better visualization hive4 <- mineHPD(hive3, option = "remove zero edge") # And finally, plotting our graph (Figure 1) plotHive(hive4, method = "abs", bkgnd = "white", axLabs = c("source", "hub", "sink"), axLab.pos = 1) ############################################################################################ # Let's do some node/edge customization # First do nodes nodes <- hive4$nodes # Change the node color and size based on node degree and betweenness values for (i in 1:nrow(nodes)) { nodes$color[i] <- nodes_col[which(nodes$lab[i] == V(gD)$name)] nodes$size[i] <- nodes_size[which(nodes$lab[i] == V(gD)$name)] } # Reassign these nodes to the hive(4) object hive4$nodes <- nodes # And plot it (Figure 2) plotHive(hive4, method = "abs", bkgnd = "white", axLab.pos = 1) # Now do the edges edges <- hive4$edges # Change the edge color based on Dice similarity for (i in 1:nrow(edges)) { index1 <- which(nodes$id == edges$id1[i]) index2 <- which(nodes$id == edges$id2[i]) edges$color[i] <- edges_col[which(E(gD)[as.character(nodes$lab[index1]) %--% as.character(nodes$lab[index2])] == E(gD))] } # Reassign these edges to the hive(4) object hive4$edges <- edges # And plot it (Figure 3) plotHive(hive4, method = "abs", bkgnd = "white", axLabs = c("source", "hub", "sink"), axLab.pos = 1) # Some edges are too thick, so we will reduce the edge weight (thickness) by 25% hive4$edges$weight <- hive4$edges$weight/4 # And plot it (Figure 5) plotHive(hive4, method = "abs", bkgnd = "white", axLabs = c("source", "hub", "sink"), axLab.pos = 1) ############################################### # Now the same using adj2HPD() instead of edge2HPD() # First, we'll create an adjacency matrix from our graph (gD) gAdj <- get.adjacency(gD, type = "upper", edges = FALSE, names = TRUE, sparse = FALSE) # Then we'll create the hive object for it hive1 <- adj2HPD(gAdj, type = "2D") # Assign nodes to a radius based on their degree (number of edges they are touching) hive2 <- mineHPD(hive1, option = "rad <- tot.edge.count") # Assign nodes to axes based on their position in the edge list hive3 <- mod.mineHPD(hive2, option = "axis <- source.man.sink") # In some cases (for undirected graphs), some nodes will not be assigned to any axes # In those cases, use the function from "mod.mineHPD.R" #source("mod.mineHPD.R") #hive3 <- mod.mineHPD(hive2, option = "axis <- source.man.sink") # Removing zero edges for better visualization hive4 <- mineHPD(hive3, option = "remove zero edge") # Node/edge customization is the same as above ################################################# # Now lets expand the available options and add some new function(alitie)s # Available in: "mod.mineHPD.R" source("mod.mineHPD.R") # Assign nodes to a radius based on the user specified values (in our case betweenness centrality) hive2 <- mod.mineHPD(hive1, option = "rad <- userDefined", radData = data.frame(nds = V(gD)$name, bc = V(gD)$betweenness)) # Assign nodes to a radius randomly hive2 <- mod.mineHPD(hive1, option = "rad <- random") # Assign nodes to axes based on their degree # Low degrees (1, 2, >2) hive3 <- mod.mineHPD(hive2, option = "axis <- deg_one_two_more") # Higer degrees (<=5, 6-10, >10) hive3 <- mod.mineHPD(hive2, option = "axis <- deg_five_ten_more") # Split axes - this function splits each of the 3 axes into 2 new axes (thus, resulting in 6 axes) # and removes edge on the same axis (but it introduces new (duplicated) nodes) hive4 <- mod.mineHPD(hive3, option = "axis <- split") #################################################
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.