Hierarchical Visualizations in R and the Javascript InfoVis Toolkit
[This article was first published on R-Chart, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I love R. It is really a great language and platform for statistical work and graphing. But every technology has its limits – and other tools can be meet different needs. So in this post, I will start with R and move on to the JavaScript InfoVis Toolkit. And I must admit, I can’t say that I know the limits of R. I am regularly corrected by the friendly and informed R community which makes this blog better – and helps me as well. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have been regularly considering the best ways to visualize trees – especially large ones. Having a background in Oracle, I often revisit the employee hierarchy that is included as part of the HR demonstration schema.
The query to construct a result set that represents the employee hierarchy relies upon Oracle’s non-standard hierarchical query syntax. Employees are associated with other records in the employee table by a manager id. The root node of the tree is the record with a null manager id (the top manager who has no superior within the organization).
library(RODBC)
ch = odbcConnect(“XE”,uid=”HR”,pwd=”HR”)
sql=”SELECT
replace(m.first_name,’ ‘,’_’)||’_’||
replace(m.last_name,’ ‘,’_’) manager_name,
replace(e.first_name,’ ‘,’_’)||’_’||
replace(e.last_name,’ ‘,’_’) employee_name
FROM employees e
LEFT OUTER JOIN employees m
ON m.employee_id = e.manager_id
WHERE e.manager_id is not null
START WITH e.manager_id IS NULL
CONNECT BY PRIOR e.employee_id = e.manager_id
ORDER siblings BY e.last_name,e.first_name”
r=sqlQuery(ch, paste(sql, collapse=’ ‘))
close(ch)
The data returned is a simple listing that pairs up each employee with his/her manager.
head(r)
MANAGER_NAME EMPLOYEE_NAME
1 Steven_King Gerald_Cambrault
2 Gerald_Cambrault Elizabeth_Bates
3 Gerald_Cambrault Harrison_Bloom
4 Gerald_Cambrault Tayler_Fox
5 Gerald_Cambrault Sundita_Kumar
6 Gerald_Cambrault Lisa_Ozer
The igraph library can then be used to plot the data.
library(igraph)
g = graph.data.frame(r, directed = T)
V(g)$label = V(g)$name
tkplot(g)
The results are a bit cluttered due to the number of nodes – but can be manually manipulated when viewed using tkplot. The typical organizational chart structure can be obtained using the Reingold-Tilford layout.
Other possibilities include the circle layout…
…the Fruchterman-Reingold layout…
…and the Kamada-Kawai layout.
A better visualization technique when trying to analyze networks of this size in a small space is to utilize animation. The following shows a couple of the animated charts available through the Javascript InfoVis framework. These were produced using ruby and sinatra to return a JSON object that is rendered using Javascript. The following is a hyperbolic tree with the root node in the center.
When a different node is clicked, it is shifted to the center and the remaining nodes are arranged around it. In this example Gerald Cambrault is selected and the graph smoothly transitions to the image that follows.
If you prefer a tree that is more like the traditional organizational chart, the space tree might be used instead.
This visualization is also will adjust when a node is selected.
It looks nice and the animation is elegant and useful (good job Nicolas). The code is available on github at this location. It is not the best implementation so feel free to fork it an provide a better solution.
To leave a comment for the author, please follow the link and comment on their blog: R-Chart.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.