Site icon R-bloggers

Patterns in the Ivy: The Small World of Metal

[This article was first published on Bad Hessian » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few months ago I started listening to Tomahawk, a band described on Wikipedia as “an experimental alternative metal/alternative rock supergroup.” Beyond the quality of their music, I found myself intrigued by the musical background of their members. In addition to Tomahawk, their other bands include acclaimed groups such as Faith No More, Helmet, the Melvins, Fantômas, and the Jesus Lizard. Mike Patton alone has been affiliated with at least fifteen bands.

Generalizing beyond Tomahawk and Mike Patton, I began wondering about the social network properties of collaborations in music. First, such a network would conceptually be a two-mode network, also known as a bipartite graph. The two modes in this network are the bands and their musicians: bands form ties to other bands through shared musicians and musicians form ties to other musicians through common bands.

Second, if other collaboration networks have exhibited small world properties, so might music collaborations. Watts (1999:508-9) defined a small world network as “a large-n, sparsely connected, decentralized…graph that exhibits a characteristic path length close to that of an equivalent random graph…, yet with a clustering coefficient much greater [than that of a random graph].” Path lengths here refer to the minimal number of steps it would take for one band to reach another through shared musicians. In the context of music collaborations, path length could indicate how fast reputations, domain knowledge, and musical styles spread. Short average path lengths would spread these artifacts faster than long path lengths. Though two bands usually do not share a direct tie through a common member (a path length of one), they often have indirect ties to each other through intermediary bands. Some of these intermediaries function as hubs, connecting a disproportionate number of actors. “Supergroups,” such as Tomahawk, very well could constitute one such hub.

The other feature of small world networks, clusterability, refers to the tendency of two actors to have a relationship if they both have a relationship with a common third actor. Within the context of friendship networks, clustering can occur when one person introduces two previously unacquainted friends. Clustering in two-mode networks works somewhat differently than that of one-mode networks. When it comes to the formation of new bands, often one founding member will recruit musicians he or she has worked with in a previous band. In music, these clusters have the potential to carry stylistic continuity from one band to another at a capacity much greater than what a single member can offer. We should note here that though Watts (1999) used a collaboration network in his study, movie actors co-appearances, this network is a one-mode projection of a conceptually two-mode relationship between actors and movies. Practices like Watts’s one-mode projection of two-mode data are endemic to social network analysis and Opsahl (2013) argues that this procedure can lead to biased conclusions regarding clustering.

So, do collaborations between musicians and their groups constitute a small world network? Luckily, the data for this question do exist at Freebase.com. Here, I focused on just metal acts, pulling from Freebase’s API only bands whose genres include the words “metal,” “death,” “thrash,” and “grind.” Why metal, you might ask? Focusing just on metal bands keeps the analyses in the Goldilocks zone of having a large enough case to produce interesting and potentially generalizable results, yet small enough so as to be computationally reasonable. The dataset on hand includes 2,663 bands and 9,936 musicians. Also, unlike some genres, metal is quite diverse and international, featuring distinct regional subgenres including Bay Area thrash, Norwegian black metal, and British heavy metal, in addition to various takes on folk metal and many others. Lastly, I’m a sucker for cute datasets that include actors with names like Pig Destroyer, Electric Wizard, Sex Machineguns, and Finntroll.

Let’s first see which bands had the most members and which musicians were part of the most bands to identify the hubs in the network.

Band Name
Members (n)
Musician Name
Bands (n)
Iron Butterfly46Gene Hoglan16
Hear ‘n Aid36James Murphy12
Michael Schenker Group34Steve DiGiorgio12
Black Sabbath29Jan Axel Blomberg11
Hawkwind29Keri Kelli11
< !-- #tablepress-3 from cache -->

We see that of the top five bands and musicians, all of them seem to be older and generally successful, giving them greater opportunities to form more ties. Though short-lived, Hear ‘n Aid was metal’s follow-up to USA for Africa’s “We are the World.” Black Sabbath is so well-known for its rotating lineup that fans have created a Sabbath Number game in the spirit of Bacon and Erdős numbers. Interestingly, of the musicians with the most bands, three of the five were in both Death and Testament. Lastly, it’s important to consider the number of these actors’ collaborations relative to the mean. For the entire network, on average, bands had 4.916 musicians and musicians played for 1.307 bands.

To calculate the clustering and path length measurements, it’s essential that we use Opsahl’s tnet package for R. This package is designed especially to generate metrics for networks that are either two-mode, weighted, or longitudinal. It’s the only package that I know of that’s immediately capable of measuring clustering as a two-mode process. Further, it includes standardized path length measurements for weighted data. For ease of interpretation, in the following analyses I generally treat the band as the primary mode.

What does clustering mean for metal collaborations? For two-mode data, the global clustering coefficient equals the number of closed four-paths divided by the total number of four-paths. Here, a four-path is either a path from musician-to-band-to-musician-to-band-to-musician or a path from band-to-musician-to-band-to-musician-to-band. These four-paths are “closed” if the first band (or musician) is the same entity as the last band (or musician). Potential four-paths are those bands with two or more members, with at least one member in common, and additional members who have played in more than one band. So how much clustering is there within metal collaborations? Approximately 10.5% of every four-path is closed indicating some propensity for bands to recruit musicians from other bands with additional members in common.

The subject of path lengths in two-mode networks raises some conceptualization questions surrounding tie weights. Should we conceptualize a tie existing between two bands if they have any members in common (i.e., a present or absent binary), ties as weighted phenomena based upon the number of members in common (i.e., a summation of common musicians), or ties as weighted phenomena penalized by the number bands the common musicians(s) participate in (i.e., Newman’s algorithm)? The first approach provides a simplified interpretation, as two bands are connected if they share any members. This method would be good for a Sabbath Number game. The second approach provides a more nuanced view of collaborations, as some bands are better connected to each other than others. This method would be good if we consider tie strength as counting the number of interactions between two bands. The third values efficiency, as some musicians play in many bands and tying any two bands through such busy musicians should count less than musicians who only play for those two bands. If we were to consider shared musicians between bands as conduits for knowledge and influence, this measurement would be the one to use. Through the function distance_tm(), tnet offers a way to measure each. By default it records the shortest path lengths for each pair of nodes in the giant component (i.e., the largest set of actors in a network such that every actor can reach every other actor). The giant component in the metal network contains 1,447 bands (or 54.3% of those in the network) and 5,220 musicians (or 52.5% of those in the network).

Within the metal network, on average each band is only five to six steps away from every other band. If we just consider bands to be connected or not (“binary”), then on average each band is about 5.098 steps from every other band it can reach. If we consider the number of members bands have in common and weight the edges accordingly (“sum”), typically each band is 5.355 average weighted steps away from most others. Lastly, if we consider that musicians with ties to many bands might not be the most efficient conduits within a network (“Newman”), then any given band is usually 5.896 average weighted steps away from the other bands. These distances are quite short considering we’re discussing a sparse network of 1,447 bands, where the bands on average have about five or six members and the typical musician only plays for one or two bands.

These weighted paths can also be used to calculate closeness centrality. Closeness centrality is a measurement of importance within a social network based upon the inverse of the number of steps needed for an actor to reach all other actors. So who are the closest actors that can access the shortest paths that characterize small world networks? The table below ranks the top five bands and musicians by their three weighted closeness centrality scores. Returning to the initial interest that prompted this post, I’ve also included Tomahawk’s and Mike Patton’s percentile by each measure.

Binary Weights, Bands
Summed Weights, Bands
Newman Weights, Bands
Binary Weights, Musicians
Summed Weights, Musicians
Newman Weights, Musicians
TestamentTestamentHear ‘n AidJohn TempestaGene HoglanSteve DiGiorgio
MegadethDeathUFOJames MurphyJames MurphyGene Hoglan
DeathExodusDokkenNicholas BarkerNicholas BarkerMike Terrana
Explorers ClubForbiddenForeignerGene HoglanSteve DiGiorgioJohn West
BrujeriaTenetQuiet RiotSteve DiGiorgioJohn TempestaMyles Kennedy
(Tomahawk = 83.4%ile)(Tomahawk = 86.5%ile)(Tomahawk = 90.1%ile)(Mike Patton = 97.2%ile)(Mike Patton = 96.0%ile)(Mike Patton = 94.7%ile)
< !-- #tablepress-4 from cache -->

As with the musicians in the previous table, here we see that the musicians in the bands Testament and Death were quite central in the collaboration network. In fact, of the five musicians with the highest closeness centrality, weighted either by a binary relationship or a summation of their collaborations, all have at some point played in the thrash band Testament. On average, most musicians are 3.529 (unweighted/binary) steps from John Tempesta, one of Testament’s drummers, and most bands are 3.159 (unweighted/binary) steps away from Testament. On the subject of thrash acts, six bands–Testament, Death, Forbidden, Exodus, Megadeath, and Tenet–of the thirteen listed as having the highest closeness centrality scores are thrash metal groups. Despite the great variety of metal subgenres, thrash groups established during the 1980s appear to be the most closely connected to other bands in the network. Further, the majority of the groups and musicians listed here are American with the remainder being British, despite thriving metal scenes in Scandinavia and elsewhere in continental Europe.

Most every metal fan is familiar with the legacy of Black Sabbath as well as the other bands associated with Sabbath musicians–including Judas Priest, Dio, the Cult, Whitesnake, Quiet Riot, Deep Purple, Faith No More, Jethro Tull, Great White, Rainbow, and Blue Öyster Cult–so how could Black Sabbath be further (“less close”) from other metal acts than Testament? To demonstrate, let’s compare their plots of their ego networks that include their members (blue circles) and the other bands (red squares) their members have played in.

Glancing through the list of other bands in Testament’s neighborhood, we do see ties to many thrash acts, but also notice the tremendous diversity among other subgenres. Reviewing a few notable examples, John Tempesta has played drums for noise-rock inspired alternative metal acts like White Zombie and Helmet; Gene “the Atomic Clock” Hoglan has played in Dethklok, the subject of Cartoon Network’s animated series Metalocalypse, and he has also filled in drumming duties for the popular Swedish progressive metal act, Opeth (the band that inspired this post’s title); Dave Lombardo also had international connections to German musicians through drumming for Voodoocult and has additionally played for avant garde metal act Fantômas; though he only played for one other band in the data, as the guitarist for Halford, Mike Chlasciak has a close indirect tie to Black Sabbath because Halford was formed by former Judas Priest singer, Rob Halford, who filled in vocals for Black Sabbath. Even the Trans-Siberian Orchestra, a band widely loved for their synthesizer-fueled Christmas music, is tied to Testament through guitarist Alex Skolnick. Black Sabbath’s ties, in comparison, seem mostly linked to their members’ solo projects as well as 1970s and 1980s heavy metal and glam metal acts.

Generally, we see in this plot that while Black Sabbath had many more members than Testament, respectively 29 and 19, their members participated in significantly fewer bands. On average, the members of Black Sabbath played in 1.274 other bands in the dataset while Testament’s members averaged 4.105 additional bands. Further, 37.9% of all Black Sabbath members had no ties to other bands compared to 15.8% for Testament. All together, the bands tied to Testament through common members total 61 while those bands tied to Black Sabbath through common members number 36.

Returning back to network-level properties, while we know both the clustering coefficient and the average path length of the network, how can we systematically tell if these measurements are respectively high and low, constituting a small world? To do so requires a comparison to chance expectations. Corresponding to one-mode data, for random networks the clustering coefficient equals the average number of ties divided by the number of actors and the mean shortest path length equals the natural log of the number of actors divided by the natural log of the average number of ties (Watts 1999). Unfortunately, this analytic solutions would not be appropriate for two-mode data.

An alternative approach would be to run a Conditional Uniform Graph (CUG) test. The method for these tests compare the observed measurements to equivalent figures measured on many random, simulated networks that resemble the observed network in a few key ways. Such tests are essential given that many network properties arise due to size and density effects. For the two-mode network studied here, in addition to controlling for size and density effects, it’s also a good idea to control for degree distribution. Controlling for degree distribution would account for band size (and turnover) as well as musician popularity, leaving hubs within the network. The way to control for these effects involve link shuffling which randomizes the data by trading musicians between bands. For example, in one such iteration Rob Halford could be a member of Metallic instead of Judas Priest and Cliff Burton would be a member of Judas Priest instead of playing for Metallica. While this method does make some heavy-handed assumptions, including a shared universe and triadic independence, it’s perhaps the best null model for two-mode networks shy of exponential random graph modeling.

I ran this test by measuring the global clustering coefficient and the mean shortest path length on a thousand simulated graphs using tnet’s rg_reshuffing_tm() function in conjugation with the base package function replicate(). Should the metal network be a small world after controlling for degree effects, then the observed clustering coefficient (.105) should be much greater than at least 95% of the expected values produced by the simulations. Likewise, the mean shortest path lengths between bands (5.098 [binary weights], 5.355 [summed weights], and 5.896 [Newman weights]) should roughly equal their corresponding, expected measurements in the simulations.

In terms of the global clustering coefficient, all of the simulations found far less clustering than the observed 10.5% of four-paths. At most, the simulations found that 0.6% of all four-paths were closed. Typically, this figure ran at about .4%. Given that the observed clustering coefficient is approximately 26 times greater than the average from the random data, we can safely conclude that metal collaborations satisfy this property of small world networks.

 

The metal networks also yielded short path lengths that, on average, roughly equaled those of the random networks, satisfying the short path length property of small world networks. Depending upon the weighting method, bands in the random networks were either 4.742 (binary), 4.733 (sum), or 4.622 (Newman) steps from each other. While none of the mean shortest path lengths between bands from the simulated data were longer than those of the observed, the ratios of the observed to the expected values equal 1.075, 1.131, and 1.276–well within the ballpark of the examples Watts (1999:516) reported.

Though randomized, the simulated graphs display some structural tendencies not exhibited in the data. Chiefly, by grossly underestimating clustering among collaborators, the simulated data forges ties across bands who would never have common members due to geographical or temporal restrictions. In determining path lengths, here I’ve used tnet’s default parameters for distance_tm() and calculated the shortest paths for the giant component only. (Recall that the giant component refers to the largest subgraph in a network such that each actor within the subgraph and directly or indirectly reach every other actor in the subgraph.) This restriction is reasonable given that pairs of nodes who cannot reach each other have an infinite path length and that documenting each pair of nodes in a network of 2,663 bands uses an extensive amount of memory. We do see that giant component in the observed dataset has 1,447 bands in it, yet in 95% of the simulations the giant component’s size ranged from 1,885 to 1,954. This difference in size may have slightly overestimated the average shortest path length in the largest components of the randomized data.

Though I’ve only looked at metal, it stands to reason that similar small world properties would emerge among other music genres where bands constitute the organizational form of collaboration. We should appreciate that bands are a particular form of social organization with distinct musician roles and sizes that typically range from about three to six members. I’d expect this form of organization to be less common among genres like electronic dance music, where disc jockeys are the primary creative entity, and classical music produced by symphonies.

In addition to the issue of the network’s connectivity and the size of the giant component, the analyses here face a few limitations in terms of the data. First, the data come from Freebase which rely upon crowdsourced databases like Wikipedia and MusicBrainz. As a result, the dataset faces issues of missing data that disproportionately strikes less popular bands, local and short-lived bands do not exist in the data, and some of the bands included in the dataset are either obviously or arguably “not metal.” Second, “solo” projects exist and sometimes these create self-ties in the data where an artist is both a musician and a band at once. Should the solo project bear the exact name as the musician, then the data counts it as a self-tie; if the names differ, then the musician and the band constitute two separate units. For example, Alice Cooper is simultaneously both a musician and a band tied to him/itself, yet Dio is a band that Ronnie James Dio sang for. Also, for reasons unknown, the data does not include Ozzy Osbourne’s solo career.

I’ve made both the data and the code for the analyses publicly available.

To leave a comment for the author, please follow the link and comment on their blog: Bad Hessian » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.