[This article was first published on R – TRinker's R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I saw Simon Jackson’s recent blog post regarding ordering categories within facets. He proposed a way of dealing with the problem of ordering variables shared across facets within facets. This problem becomes apparent in text analysis where words are shared across facets but differ in frequency/magnitude ordering within each facet. Julia Silge and David Robinson note that this is a particularly vexing problem in a TODO comment in their tidy text book:
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
## TODO: make the ordering vary depending on each facet
## (not easy to fix)
Terminology
Definitions of the terms I use to describe the solution:- category variable – the bar categories variable (terms in this case)
- count variable – the bar heights variable
- facet variable – the grouping used for faceting
Logic
I have dealt with the ordering within facets problem using this logic:- Order the data rows by grouping on the facet variable and the categories variable and
arrange
-ing on the count variable in a descending* fashion - Ungroup
- Remake the categories variable by appending the facet label(s) as a deliminated suffix to the categories variable (make sure this is a factor with the levels reversed) [this maintains the ordering by making shared categories unique]
- Plot as usual**
- Remove the suffix you added previously using
scale_x_discrete
**I prefer the ggstance
geom_barh
to the ggplot2 geom_bar
+ coord_flip
as the former lets me set y as the terms variable and the later doesn’t always play nicely with scales being set free
This approach adds an additional 5 lines of code (in the code below I number them at as comment integers) and is, IMO, pretty easy to reason about. Here’s the additional lines of code:
group_by(word1, word2) %>% arrange(desc(contribution)) %>% ungroup() %>% mutate(word2 = factor(paste(word2, word1, sep = "__"), levels = rev(paste(word2, word1, sep = "__")))) %>% # --ggplot here-- scale_x_discrete(labels = function(x) gsub("__.+$", "", x))
Example
Let’s go ahead and use Julia and David’s example to demonstrate the technique:# Required libraries p_load(tidyverse, tidytext, janeaustenr) # From section 5.1: Tokenizing by n-gram austen_bigrams <- austen_books() %>% unnest_tokens(bigram, text, token = "ngrams", n = 2) # From section 5.1.1: Counting and filtering n-grams bigrams_separated <- austen_bigrams %>% separate(bigram, c("word1", "word2"), sep = " ") # From section 5.1.3: Using bigrams to provide context in sentiment analysis AFINN <- get_sentiments("afinn") negation_words <- c("not", "no", "never", "without") negated_words <- bigrams_separated %>% filter(word1 %in% negation_words) %>% inner_join(AFINN, by = c(word2 = "word")) %>% count(word1, word2, score, sort = TRUE) %>% ungroup() # Create plot negated_words %>% mutate(contribution = n * score) %>% mutate(word2 = reorder(word2, contribution)) %>% group_by(word1) %>% top_n(10, abs(contribution)) %>% group_by(word1, word2) %>% #1 arrange(desc(contribution)) %>% #2 ungroup() %>% #3 mutate(word2 = factor(paste(word2, word1, sep = "__"), levels = rev(paste(word2, word1, sep = "__")))) %>% #4 ggplot(aes(word2, contribution, fill = n * score > 0)) + geom_bar(stat = "identity", show.legend = FALSE) + facet_wrap(~ word1, scales = "free") + xlab("Words preceded by negation") + ylab("Sentiment score * # of occurrences") + theme_bw() + coord_flip() + scale_x_discrete(labels = function(x) gsub("__.+$", "", x)) #5
To leave a comment for the author, please follow the link and comment on their blog: R – TRinker's R Blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.