Summary of community detection algorithms in igraph 0.6
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Based on Launchpad traffic and mailing list responses, Gabor and Tamas will soon be releasing igraph 0.6. In celebration, I’ll be publishing a number of helpful lists and tables I’ve put together to organize information about igraph.
In this post, we’ll cover the community detection algorithms (~i.e., clustering, partitioning, segmenting) available in 0.6 and their characteristics, such as their worst-case runtime performance and whether they support directed or weighted edges. Much of the information below is gleaned from the igraph C documentation, source algorithm publications, and three years of tracking the 0.6 trunk.
Optimal Modularity
In my mind, modularity is more of a framework than just a simple function over a graph. I don’t know how Mark feels, but the amount of work based on this idea, implicitly (~87,000) or explicitly (~1000), is staggering given the short six years since.
That said, modularity is just a framework, and, like all frameworks, has its shortcomings. If you are going to talk about modularity in a quantitative way, there are two must-read ideas on the topic:
- The performance of modularity maximization in practical contexts.
B. H. Good, Y.-A. de Montjoye and A. Clauset.
Physical Review E 81, 046106 (2010). - Resolution limit in community detection
Santo Fortunato, Marc Barthelemy - More work on tradeoffs in this limit by Fortunato:
Limits of modularity maximization in community detection
Andrea Lancichinetti, Santo Fortunato
TL;DR: The plateau of the modularity surface we care about most behaves poorly in real networks, and, depending on |V| and |E| in absolute and relative terms, we might not see “smaller” clusters.
If you’re still going to calculate the optimal modularity configuration, here are the details:
- New to version 0.6: TRUE
- Directed edges: FALSE
- Weighted edges: FALSE
- Handles multiple components: TRUE
- Runtime: B^(|V|^2)
Edge-Betweenness
OK, time to get less verbose. I’ll stick to the primary reference and igraph characteristics going forward:
- Community structure in social and biological networks
M. Girvan and M. E. J. Newman - New to version 0.6: FALSE
- Directed edges: TRUE
- Weighted edges: TRUE
- Handles multiple components: TRUE
- Runtime: |V||E|^2
Leading Eigenvector
- Finding community structure in networks using the eigenvectors of matrices
MEJ Newman
Phys Rev E 74:036104 (2006) - New to version 0.6: FALSE
- Directed edges: FALSE
- Weighted edges: FALSE
- Handles multiple components: TRUE
- Runtime: c|V|^2 + |E|
Fast-Greedy
- Finding community structure in very large networks
Aaron Clauset, M. E. J. Newman, Cristopher Moore - Finding Community Structure in Mega-scale Social Networks
Ken Wakita, Toshiyuki Tsurumi - New to version 0.6: FALSE
- Directed edges: FALSE
- Weighted edges: TRUE
- Handles multiple components: TRUE
- Runtime: |V||E| log |V|
Multi-Level
- Fast unfolding of communities in large networks
Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre - New to version 0.6: TRUE
- Directed edges: FALSE
- Weighted edges: TRUE
- Handles multiple components: TRUE
- Runtime: “linear” when |V| \approx |E|; (a quick glance at the algorithm suggests this would be quadratic for fully-connected graphs)
Walktrap
- Computing communities in large networks using random walks
Pascal Pons, Matthieu Latapy - New to version 0.6: FALSE
- Directed edges: FALSE
- Weighted edges: TRUE
- Handles multiple components: FALSE
- Runtime: |E||V|^2
Label Propagation
- Near linear time algorithm to detect community structures in large-scale networks.
Raghavan, U.N. and Albert, R. and Kumara, S.
Phys Rev E 76, 036106. (2007) - New to version 0.6: TRUE
- Directed edges: FALSE
- Weighted edges: TRUE
- Handles multiple components: FALSE
- Runtime: |V| + |E|
InfoMAP
- The map equation
M. Rosvall, D. Axelsson, C. T. Bergstrom - New to version 0.6: TRUE
- Directed edges: TRUE
- Weighted edges: TRUE
- Weighted nodes: TRUE
- Handles multiple components: FALSE
- Runtime: none given; looks worst case like |V|(|V| + |E|) based on quick reading.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.