Conference abstract bi-grams – FOSS4GUK
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I helped run a conference last week. As part of this I produced a wordcloud from the conference abstracts, although pretty it could have been more informative of the conference content. This blog post shows you how to make a network of conference bi-grams.
A bi-gram is a pair of words. In the last sentence “is a”, “a pair” and “pair of” are all bi-grams, they are pairs of words which are adjacent. I based this blog post on Julia and David’s excellent tidytext book. As before each abstract is stored in a separate file, so I’ve read each of those in and then turned them into a tidy bi-gram table:
library(tidyverse) library(tidytext) library(tidygraph) library(ggraph) library(extrafont) # ---------------------------- data("stop_words") f = list.files("~/Cloud/Michael/FOSS4G/talks/abstracts_clean/") abstracts = lapply(f, function(i){ read_table(paste0("~/Cloud/Michael/FOSS4G/talks/abstracts_clean/", i), col_names = F) %>% gather(key, word) %>% select(-key) %>% add_column(author = str_remove(i, ".txt")) %>% unnest_tokens(bigram, word, token = "ngrams", n = 2) }) abstracts = do.call("rbind.data.frame", abstracts) bigrams = abstracts %>% separate(bigram, c("word1", "word2"), sep = " ") %>% filter(!word1 %in% stop_words$word[stop_words$word != "open"]) %>% filter(!word2 %in% stop_words$word[stop_words$word != "open"]) %>% filter(!str_detect(word1, "[0-9]")) %>% filter(!str_detect(word2, "[0-9]")) %>% filter(!str_detect(word1, "NA")) %>% filter(!str_detect(word2, "NA")) bigram_counts = bigrams %>% count(word1, word2, sort = TRUE)
Then I write out a graph to a png. There’s some nifty stuff on the repel line which keeps labels on the plot and I’ve event put the text into the conference font:
png("~/Cloud/Michael/FOSS4G/talks/abstract_bigram.png", width=1200, height=850, res=110) bigram_counts %>% filter(n > 1) %>% as_tbl_graph() %>% ggraph(layout = "fr") + geom_edge_link(width = 1.1, colour = "#f49835") + geom_node_point(colour = "#497fbf") + geom_node_text(aes(label = name), colour = "grey10", vjust = 1, hjust = 1, repel = T, force = 0.1, box.padding = 0)+ labs(title = "FOSS4GUK 2019 - Edinburgh", subtitle = "Abstract bigrams") + theme(text = element_text(family = "Aileron SemiBold", colour = "grey10")) dev.off()
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.