Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
After some more digging, and a suggestion by @theMexIndian I decided to see more in the depth the unvotes database that I wrote about some weeks ago.
This time, amit suggested I do some hierarchical clustering of the votes. So here goes a very dirty first attempt…
Data and setup
Nothing too impressive here… (for a discussion of the package, see the original post).
library(dplyr) library(magrittr) library(unvotes) library(reshape2) # number of roll-calls votes <- un_votes %>% left_join(., un_roll_calls) %>% left_join(., un_roll_call_issues) length(unique(votes$rcid)) # [1] 5275 # number of unique roll call votes
There are more than 5k unique roll calls, so if we where to open up dimensionality by each roll call, this is going to be huge, but i’ll go ahead and do it anyways, to test a hypothesis towards the end…
‘Widen’ data…
wide <- votes %>% select(rcid, country, vote) %>% dcast(, formula = rcid+country ~ vote) %>% dcast(, formula = country~rcid+yes+no+abstain) str(wide) # 'data.frame': 200 obs. of 14352 variables:
Now that we have a very high dimension data set (each variable is the vote in a roll call, for example, abstain_120, yes_120, no_120 would be a count of abstain, yes and no votes in roll call 120). This data set is basically ones and ceros. Now to do some cleaning and get the distance matrix…
wide[is.na(wide)] <- 0 d_wide <- as.matrix(wide) row.names(d_wide) <- wide$country # to name rows d_wide <- dist(d_wide) # distance matrix hc_wide <- hclust(d_wide) # hierarchical cluster
Let’s graph this hierarchical clustering using the ggdendro
package…
library(ggdendro) library(eem) # blog colors ggdendrogram(hc_wide, rotate = TRUE) + theme_eem() + theme(axis.text.y = element_text(size=6)) + labs(x = "country", y = "", title = "Hierarchical clusters of votes \n in U.N.")
I’m going to export these clusters and upload them on my github for anyone to download.
hc_c <- cutree(hc_wide, k = 8) hc_c <- as.data.frame(hc_c, row.names = names(hc_c)) hc_c$c <- row.names(hc_c) cc <- hc_c %>% arrange(-hc_c) write.csv(as.data.frame(cc), file = "country_clusters.csv")
By issues
Now, because the latest data set was very high dimension, i’m going to condense the analysis to just votes on particular issues. The data base has seven core issues, so i’m going to try to group by issue instead of roll call. This might let us see if there are different voting blocs from the earlier set (maybe countries vote the same, except when important issues come up).
# Widen, by issue... wide_byissue <- votes %>% select(issue, country, vote) %>% dcast(, formula = country ~ vote+issue) wide_byissue[is.na(wide_byissue)] <- 0 d_wide_issue <- as.matrix(wide_byissue) row.names(d_wide_issue) <- wide_byissue$country d_wide_issue <- dist(d_wide_issue) hc_wide_issue <- hclust(d_wide_issue) ggdendrogram(hc_wide_issue, rotate = TRUE) + theme_eem() + theme(axis.text.y = element_text(size=6)) + labs(x = "country", y = "", title = "Hierarchical clusters of votes \n in U.N. (issues)")
I’ll export this too…
hc_c2 <- cutree(hc_wide_issue, k = 8) hc_c2 <- as.data.frame(hc_c2, row.names = names(hc_c2)) hc_c2$c <- row.names(hc_c2) cc2 <- hc_c2 %>% arrange(-hc_c2) write.csv(as.data.frame(cc2), file = "country_clusters_issue.csv")
To disprove the earlier hypothesis, i’m going to find Mexico’s neighborhood, and see if there are many countries that repeat themselves in both sets…
# find cluster where Mexico lives ... neighborhood_mx <- hc_c %>% filter(hc_c == 3) neighborhood_mx_issue <- hc_c2 %>% filter(hc_c2 == 1) sum(neighborhood_mx_issue$c %in% neighborhood_mx$c)/length(neighborhood_mx_issue$c) # [1] 0.8 # export mexico's neighborhood write.csv(neighborhood_mx_issue, file = "neighborhood_mx_issue.csv")
So 80% of the country’s are “close” to Mexico whether the vote is by issue or by roll call. This is a rough first attempt (there are probably many slight errors) but there are some interesting things to be found.
In the issues groups, the outliers in a single group are the United States and Israel (the Palestinian conflict probably is the culprit here, as I found earlier, they agree on 77% of the votes).
Then there are countries that seem to be very close culturally, and they show it in the votes…
# advanced foreign policy hc_c2 %>% filter(hc_c2 == "6") # [1] "Austria" "Denmark" "Finland" "Greece" "Iceland" # [6] "Ireland" "Japan" "New Zealand" "Norway" "Spain" # [11] "Sweden"
Finally, some like-minded countries, like Chile, Colombia, Panama, Paraguay, Peru, etc are in Mexico’s neighborhood (although it’s one of the largest groups).
Tweet me up if you have any questions with the data!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.