[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
library(quanteda) #install with install.packages("quanteda") if needed data(data_corpus_inaugural) speeches <- data_corpus_inaugural$documents row.names(speeches) <- NULL
As you can see, this dataset has each Inaugural Address in a column called “texts,” with year and President’s name as additional variables. To do analysis on the words in speeches, and generate a wordcloud, we’ll want to unnest the words in the texts column.
library(tidytext) library(tidyverse) speeches_tidy <- speeches %>% unnest_tokens(word, texts) %>% anti_join(stop_words) ## Joining, by = "word"
For our first wordcloud, let’s see what are the most common words across all speeches.
library(wordcloud) #install.packages("wordcloud") if needed speeches_tidy %>% count(word, sort = TRUE) %>% with(wordcloud(word, n, max.words = 50))