Tractatus Logico (Phylo)sophicus
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Over the Christmas holidays, I read “Maths Meets Myths: Quantitative Approaches to Ancient Narratives,” from the Springer Understanding Complex Systems collection.
The authors present their application of “hard” science techniques to datasets coming from the humanities — mostly large corpus of texts, legends and myths.
One paper in particular uses bioinformatics and phylogenetics to study the spread of a popular folk tale: Little Red Riding Hood. The story that I knew from Perrult and Grimm has patterns that are also found in African and East Asian tales.
The Tractatus Logico-Philosophicus viewed as a phylogenetic tree
Inspired by this, I've had a look at Wittgenstein's Tractatus Logico Philosophicus (available on Project Gutenberg), which is presented as hierachically numbered statements and sub-statements.
We start by scraping the book into a dataframe with one row per statement:
library(rvest) page <- read_html("http://www.tractatuslogico-philosophicus.com/") root <- page %>% html_node("#root") df <- data.frame() for (item in root %>% html_nodes('li')) { label <- item %>% html_attr("data-name") content <- item %>% html_text(trim = TRUE) temp <- data.frame(label, content) df <- rbind(df, temp) }
We then generate our cluster analysis based on the distance between the columns of df
, hoping that the hierachical numbering of statements will yield something interesting.
We adopt the single
method, described like so:
The single linkage method (which is closely related to the minimal spanning tree) adopts a ‘friends of friends’ clustering strategy
clusters <- hclust(dist(df), method = "single")
Dendograms galore
From these clusters, we can represent the book as dendograms, which are used in phylogenetics to represent evolutionary splits and genetic relationships in a tree.
plot(clusters, labels = clusters$labels)
d <- as.dendrogram(clusters) plot(d, horiz = TRUE, type = "triangle")
library(ape) plot(as.phylo(clusters), type = "fan")
The diagrams above show how our clusters have correctly grouped together the hierachical statements of the Tractatus.
From Mike Bostock's Tree of Life helped by Jason Davies' work parsing a Newick text file format (standard in tree representations) in Javascript, I re-implemented the above with d3-jetpack
and ES6: https://bl.ocks.org/basilesimon/66db4338c15099f6e8d62f236db2ef2d.
The resulting chart is at the top of this page.
I love how simple the result looks and how little we end up knowing about the book itself. The only thinkg I'll let you in the final, chapter seven put-down of this book about language, facts and truths of the world:
What we cannot speak about we must pass over in silence.
Precisely what I didn't do in this blog about phylogenetics and a book I never finished.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.