Site icon R-bloggers

A large repository of networkdata

[This article was first published on schochastics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are many network repositories out there that offer a large variety of amazing free data. (See the awesome network analysis list on github for an overview.) The problem is, that network data can come in many formats. Either in plain text as edgelist or adjacency matrix, or in a dedicated network file format from which there are many (paj,dl,gexf,graphml,net,gml,…). The package igraph has an import function for these formats (read_graph()) but I have found it to be unreliable at times.

The networkdata package collates datasets from many different sources and makes the networks readily available in R. The data is very diverse, ranging from traditional social networks to animal, covert, and movie networks. In total, the package includes 979 datasets containing 2135 networks. As such, I hope this package to be a good resource for teaching, workshops and for research if example data is needed. You can only get so far with the Karate network.

library(igraph)
library(networkdata)

Install

Due to the nature of the package (only data, no functions), it will not go to CRAN at any point. However, the package is available via drat (If you are looking for stable builds of the package). With drat, you can install and upgrade non-CRAN packages directly from R using the standard install.packages() and update.packages() functions.

# install.packages("drat")
drat::addRepo("schochastics")
install.packages("networkdata")

To save on line of code in the future, you can add drat::addRepo("schochastics") to your .Rprofile.

The developer version is available via github.

remotes::install_github("schochastics/networkdata")

The required space for the package is ~22MB, given that it includes a lot of data.

Overview

So far, the package includes datsets from the following repositories:

All networks are in igraph format. If you are used to work with the network format (as in sna and ergm), you can use the intergraph package to easily switch between igraph and network.

A list of all datasets can be obtained with

data(package = "networkdata")

Alternatively, use the function show_networks() to get a list of datasets with desired properties.

head(show_networks(type = "directed"),n = 10)
##         variable_name      network_name is_collection no_of_networks
## 38             ants_1            ants_1         FALSE              1
## 39             ants_2            ants_2         FALSE              1
## 42                atp               atp          TRUE             52
## 45             bkfrac            bkfrac         FALSE              1
## 47             bkoffc            bkoffc         FALSE              1
## 49             bktecc            bktecc         FALSE              1
## 50               bott              bott         FALSE              1
## 55           cent_lit          cent_lit         FALSE              1
## 106 dnc_temporalGraph dnc_temporalGraph         FALSE              1
## 109     eies_messages     eies_messages         FALSE              1
##         nodes     edges is_directed is_weighted is_bipartite has_vattr
## 38    16.0000   200.000        TRUE       FALSE        FALSE     FALSE
## 39    13.0000   361.000        TRUE       FALSE        FALSE     FALSE
## 42   499.3462  3164.404        TRUE        TRUE        FALSE      TRUE
## 45    58.0000  3306.000        TRUE        TRUE        FALSE     FALSE
## 47    40.0000  1558.000        TRUE        TRUE        FALSE     FALSE
## 49    34.0000  1122.000        TRUE        TRUE        FALSE     FALSE
## 50    11.0000   256.000        TRUE        TRUE        FALSE      TRUE
## 55   129.0000   613.000        TRUE       FALSE        FALSE     FALSE
## 106 1891.0000 37421.000        TRUE       FALSE        FALSE     FALSE
## 109   32.0000   460.000        TRUE        TRUE        FALSE      TRUE

If you use any of the included datasets, please make sure to cite the appropriate orginal source, which can be found in the help file for each network.

To leave a comment for the author, please follow the link and comment on their blog: schochastics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.