Coloring Under the Lines in ggplot
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I am tasked with explaining incredibly complex things to people who do not have a lot of time. Consequently, using visuals has been a life saver.
One day I was visiting a school explaining the Common Eurpoean Framework of Reference for Languages, which, in a nutshell, describes what language learners can do at different levels of proficiency AND the number of hours it takes for them to progress to each level.
During the presentation I used the following table in a slide:
Image Courtesy of Keep Calm and Teach English
While that image is informative, it is, in my humble opinion, a little hard to comprehend in comparison to this one:
So how do you make the plot above? Glad you asked ????
Step 1: Create the data frame
As the table above shows, there are seven levels what we want to represent (A0 to C2) and a range of hours from 0 – 1200.
library(tidyverse) library(knitr) #To make the table look pretty on HTML cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")), hours = c(0, 100, 200, 400, 600, 800, 1200)) kable(cefr_hours)
cefr | hours |
---|---|
A0 | 0 |
A1 | 100 |
A2 | 200 |
B1 | 400 |
B2 | 600 |
C1 | 800 |
C2 | 1200 |
Step 2: Expand the data frame
In order to color the sections between the levels, we need to create groups so that ggplot()
divides the the plot based on the correct levels. To do that, we’ll simply double the data frame.
cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")), hours = c(0, 100, 200, 400, 600, 800, 1200)) cefr_hours <- cefr_hours %>% bind_rows(cefr_hours) kable(cefr_hours)
cefr | hours |
---|---|
A0 | 0 |
A1 | 100 |
A2 | 200 |
B1 | 400 |
B2 | 600 |
C1 | 800 |
C2 | 1200 |
A0 | 0 |
A1 | 100 |
A2 | 200 |
B1 | 400 |
B2 | 600 |
C1 | 800 |
C2 | 1200 |
Step 3: Create groups
Next, we rearrange the data frame by CEFR level (more on that later) and create a group for each level. To do so, we create a new column called group
using dplyr::mutate
.
cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")), hours = c(0, 100, 200, 400, 600, 800, 1200)) cefr_hours <- cefr_hours %>% bind_rows(cefr_hours) %>% arrange(cefr) %>% mutate(group = ceiling((row_number() - 1) / 2)) kable(cefr_hours)
cefr | hours | group |
---|---|---|
A0 | 0 | 0 |
A0 | 0 | 1 |
A1 | 100 | 1 |
A1 | 100 | 2 |
A2 | 200 | 2 |
A2 | 200 | 3 |
B1 | 400 | 3 |
B1 | 400 | 4 |
B2 | 600 | 4 |
B2 | 600 | 5 |
C1 | 800 | 5 |
C1 | 800 | 6 |
C2 | 1200 | 6 |
C2 | 1200 | 7 |
If we don’t use arrange()
we get the following mess.
cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")), hours = c(0, 100, 200, 400, 600, 800, 1200)) cefr_hours <- cefr_hours %>% bind_rows(cefr_hours) %>% mutate(group = ceiling((row_number() - 1) / 2)) kable(cefr_hours)
cefr | hours | group |
---|---|---|
A0 | 0 | 0 |
A1 | 100 | 1 |
A2 | 200 | 1 |
B1 | 400 | 2 |
B2 | 600 | 2 |
C1 | 800 | 3 |
C2 | 1200 | 3 |
A0 | 0 | 4 |
A1 | 100 | 4 |
A2 | 200 | 5 |
B1 | 400 | 5 |
B2 | 600 | 6 |
C1 | 800 | 6 |
C2 | 1200 | 7 |
“What about ceiling()
?”
Good question!
We use ceiling()
in order to create the groups. If since we want “A1 to A2” to be one group, we need to return whole numbers. For more on how to use ceiling()
please click here.
Step 4: Remove Unecessary Groups
Since we don’t want the first or last level to be a group unto itself, we use dplyr::filter()
to remove the first and the last group by saying group
is equal to all rows except for the min()
and max()
(i.e., the first and the last).
cefr_hours <- tibble(cefr = as_factor(c("A0", "A1", "A2", "B1", "B2", "C1", "C2")), hours = c(0, 100, 200, 400, 600, 800, 1200)) cefr_hours <- cefr_hours %>% bind_rows(cefr_hours) %>% arrange(cefr) %>% mutate(group = ceiling((row_number() - 1) / 2)) %>% filter(group != min(group), group != max(group)) kable(cefr_hours)
cefr | hours | group |
---|---|---|
A0 | 0 | 1 |
A1 | 100 | 1 |
A1 | 100 | 2 |
A2 | 200 | 2 |
A2 | 200 | 3 |
B1 | 400 | 3 |
B1 | 400 | 4 |
B2 | 600 | 4 |
B2 | 600 | 5 |
C1 | 800 | 5 |
C1 | 800 | 6 |
C2 | 1200 | 6 |
Step 5: Make the plot
From here, it is simply a matter of plugging the data into ggplot()
.
ggplot(data = cefr_hours, mapping =aes(x= cefr, y=hours, group = group, fill = group)) + geom_ribbon(aes(ymin = 0, ymax = hours))
But, of course, when we’re talking about ggplot()
, that means we have no end of options at our disposal.
ggplot(data = cefr_hours, mapping =aes(x= cefr, y=hours, group = group, fill = group)) + geom_ribbon(aes(ymin = 0, ymax = hours)) + scale_color_brewer(palette = "Blues") + theme_minimal() + # Set the theme labs(title = "Hours of Guided Learning Per Level", # Give the plot a title subtitle = "Source: Cambridge English Assessment", # Give it a subtitle x = "", # Remove the title on the x axis y = "") + # Remove the title on the y axis theme(legend.position = "none", # Delete the legend axis.text.x = element_text(size = 20), # Set the size to 20 axis.text.y = element_text(size = 20), # Set the size to 20 plot.title = element_text(size = 25)) # Set the size to 25
Finally, a special thanks to Jordo82 whose answer to my question enabled me to make this plot.
Happy Coding!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.