Creating a network using R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
On January, 10 2016 David Bowie left this earthly realm. Last month I decided to create a network and here is how to do that.
Required packages
You need jsonlite
, igraph
, network
, plyr
and R base.
Other tools
D3Plus by Alex Simoes and Dave Landry. Also Google Sheets.
My data is here.
Loading packages
# 1: define the libraries to use libraries <- c("jsonlite","igraph","network", "data.table", "plyr") # 2: this is the function to download and or load libraries on the fly download_and_or_load <- function(pkg){ new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])] if (length(new.pkg)) install.packages(new.pkg, dependencies = TRUE) sapply(pkg, require, character.only = TRUE) } # 3: use the function from step 2 download_and_or_load(libraries)
Building the network
D3Plus needs three files: data, edges and nodes to visualize networks.
Data
This is the easy part. I downloaded the sheet named "data" from my spredsheet in CSV format. Then I convert the CSV to JSON with these lines:
data <- read.csv("data.csv") data <- toJSON(data, pretty = TRUE) write(data, file = "bowie_data.json")
Edges
Here is a bit trickier.
I downloaded the sheet named "collaborations" from my spredsheet in CSV format. In this matrix \(M\) this is the meaning of the entries:
Then arrange the matrix to fix row names and column names:
bowie_collaborations <- read.csv("collaborations.csv") rownames(bowie_collaborations) <- bowie_collaborations[,1] bowie_collaborations <- bowie_collaborations[,-1] colnames(bowie_collaborations) <- rownames(bowie_collaborations)
With the matrix ready I can create the network. You can try different layouts explained in igraph
documentation. This is the code to create the network and display a static version of it:
bowie_gr <- matrix(unlist(bowie_collaborations), ncol = nrow(bowie_collaborations), byrow = TRUE) rownames(bowie_gr) <- rownames(bowie_collaborations) colnames(bowie_gr) <- colnames(bowie_collaborations) bowie_gr <- which(bowie_gr > 0, arr.ind=TRUE) bowie_gr.graph <- minimum.spanning.tree(graph.data.frame(bowie_gr, directed=F)) bowie_gr.names <- colnames(bowie_collaborations)[as.numeric(V(bowie_gr.graph)$name)] bowie_gr.graph <- simplify(bowie_gr.graph, remove.multiple = T, remove.loops = T) set.seed(1234) bowie_gr.layout <- layout_with_fr(bowie_gr.graph) plot(bowie_gr.graph, edge.arrow.size=.3, vertex.label=bowie_gr.names, layout=bowie_gr.layout)
Now I do save the edges (names and ids) and the network layout:
write.graph(bowie_gr.graph, "exported_edges_bowie.csv", format=c("pajek")) write.csv(bowie_gr.names, "exported_names_bowie.csv") write.csv(bowie_gr.layout, "exported_coordinates_bowie.csv")
Finally I rearrange the edges to display names instead of numeric ids and save the result in JSON format:
network_names <- read.csv("exported_names_bowie.csv") setnames(network_names, colnames(network_names), c("source_num","source")) network_names$target_num <- network_names$source_num network_names$target <- network_names$source network_edges <- read.csv("exported_edges_bowie.csv", sep = " ") network_edges <- network_edges[-1,] setnames(network_edges, colnames(network_edges), c("source_num","target_num")) network_edges <- join(network_edges, network_names[,c("source","source_num")], by = "source_num") network_edges <- join(network_edges, network_names[,c("target","target_num")], by = "target_num") network_edges <- network_edges[,c("source","target")] source <- as.data.frame(network_edges$source) colnames(source) <- "source" target <- as.data.frame(network_edges$target) colnames(target) <- "target" network_edges <- data.frame(matrix(ncol = 1, nrow = nrow(network_edges))) network_edges$source <- source network_edges$target <- target colnames(network_edges$source) <- "Artist" colnames(network_edges$target) <- "Artist" network_edges_json = toJSON(network_edges, pretty = TRUE) write(network_edges_json, "bowie_edges.json")
Nodes
This is easier than the edges part. The code to save the nodes in JSON format with names instead of numeric ids is:
network_nodes <- read.csv("exported_coordinates_bowie.csv") setnames(network_nodes, colnames(network_nodes), c("target_num","x","y")) network_nodes <- join(network_nodes, network_names[,c("target","target_num")], by = "target_num") network_nodes <- network_nodes[,c("target","x","y")] setnames(network_nodes, colnames(network_nodes), c("Artist","x","y")) network_nodes_json <- toJSON(network_nodes, pretty=TRUE) write(network_nodes_json, "bowie_nodes.json")
Put your files in a D3Plus network template
In my case I decided to use bl.ocks.org to show my network. Use this template and edit the links to data, edges and nodes to make it to work.
<!doctype html> <meta charset="utf-8"> <script src="https://d3plus.org/js/d3.js"></script> <script src="https://d3plus.org/js/d3plus.js"></script> <div id="network"></div> <script> var visualization = d3plus.viz() .container("#network") .data("bowie_data.json") .edges("bowie_edges.json") .nodes("bowie_nodes.json") .type("network") .resize(true) .id(["Genre","Artist"]) .font({"family": "Lato"}) .size(1) .depth(1) .color("Color") .title("David Bowie Collaborations") .tooltip({"value": ["Genre","Collaboration"],"size": false}) .legend({"size": 32}) .draw() </script> <link href="https://fonts.googleapis.com/css?family=Lato:400,700" rel="stylesheet" type="text/css">
You can also use Roboto, another Google Font or just any typography you want.
Final result
After some edges editing in Atom (just aesthetic changes to put some edges closer to similar artists) the result is here.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.