Sketchy waffle charts in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Waffle charts are a common way to visualise counts or percentages of categorical data. There are already several excellent ways of creating waffle charts in R – including approaches using {ggplot2} or {waffle}. This blog post uses neither of those. Instead, it describes a somewhat back-to-basics approach of simply drawing lots of squares. This makes it a little bit easier to then create a version of a waffle chart that has a hand-drawn, sketchy effect.
Sketchy plots in R
The
{ggrough} package converts plots made with {ggplot2} to rough (or sketchy) looking charts using the rough.js
javascript library. Unfortunately, the package is in a dormant state and it doesn’t currently work with more recent versions of {ggplot2}. There are several solutions to this problem:
- Install an older version of {ggplot2}, and make a waffle chart with
geom_tile()
directly in {ggplot2} or usinggeom_waffle()
from {waffle}. Then we could use {ggrough}. - Try to fix the issues in {ggrough} and suggest the updates in a PR.
- Or we could create our own hacky solution by using several other R packages.
Specifically, a hacky solution using a combination of {roughsf} and {sf}. The
{roughsf} package also wraps rough.js
but takes sf
(simple features) objects rather than {ggplot2} objects as inputs. It’s primarily used for creating sketchy looking maps. But {sf} doesn’t just make maps – it can make essentially any shape using points, lines, and polygons. And if you look at charts in an abstract manner, they’re also really just points, lines, and polygons.
Note: {ggrough} and {roughsf} are not the only R packages that can be used to create sketchy looking charts in R. The {roughnet} package also wraps rough.js and works specifically for visualising network data. The r-sketchy project also suggests a method for drawing sketchy looking lines.
Let’s start by getting some data to make a waffle chart of!
Data processing
Since we’re trying to make something sketchy and artistic looking, it might be nice to use some artistic data. Luckily, the #TidyTuesday Project has shared some data on the colours used in Bob Ross paintings. Let’s load it in from the #TidyTuesday GitHub repository:
1 |
bob_ross <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-02-21/bob_ross.csv") |
Let’s also load the packages we’ll need for processing and plotting the data:
1 2 3 |
library(tidyverse) library(sf) library(roughsf) |
The bob_ross
data contains information on each episode such as the painting title, a list of colours used, and a link to the YouTube video of the episode. It also has some binary columns relating to colour names, and whether they were used in each painting. We’ll start by creating a lookup table of colour names and hex codes (since we’ll later colour the waffle chart using the hex codes).
If you’re mostly interested in the plotting with {sf} and {roughsf} aspects, feel free to skip ahead to the next section.
The hex colours and colours are unfortunately not stored in the most user-friendly format. For example, the first entry of the color_hex
column, looks like this character string:
1 |
"['#4E1500', '#DB0000', '#FFEC00', '#102E3C', '#021E44', '#0A3410', '#FFFFFF', '#221B15']" |
To create a lookup colour table, we need only the colors
and colors_hex
columns. We can use separate_longer_delim()
from {tidyr} to split these character strings of lists into multiple rows, based on the ,
separator. We then need to tidy up the output by removing the square brackets, the extra quotation marks, the slightly odd \\r
and \\n
characters, as well as any extra white space at the start or end of the strings. We can use the str_remove_all()
and str_trim()
functions from {stringr} to do this. Many of the colours are used across multiple paintings but we only need to keep one row for each colour in the lookup table, i.e. the distinct()
rows.
1 2 3 4 5 6 |
colour_lookup <- bob_ross |> select(colors, color_hex) |> separate_longer_delim(c(colors, color_hex), delim = ",") |> mutate(across(everything(), ~str_remove_all(., "\\[|\\]|'|\\\\r|\\\\n"))) |> mutate(across(everything(), ~str_trim(.))) |> distinct() |
We have 18 unique colour names. Now we want to calculate the percentage of colour uses that each colour corresponds to. We start by selecting only the binary columns from bob_ross
that denote if a colour was used in an episode. Then we simply add up the number of 1
’s in each column. This gives us the number of episodes that each colour was used in. To make processing the data easier, we convert it into long format - we now have two columns relating to the colour name, and the number of episodes each was used in. We also replace the "_"
in the colour names with a " "
so that it matches the colour names in the lookup table we just created e.g. "Cadmium_Yellow"
becomes "Cadmium Yellow"
.
1 2 3 4 5 6 7 8 9 |
count_data <- bob_ross |> select(c(Black_Gesso:Alizarin_Crimson)) |> summarise(across(Black_Gesso:Alizarin_Crimson, ~ sum(.x, na.rm = TRUE))) |> pivot_longer( cols = everything(), names_to = "colors", values_to = "n" ) |> mutate(colors = str_replace_all(colors, "_", " ")) |
We join this count data to the lookup table using a left_join()
based on the colors
column, and calculate the percentage of the total for each colour. To make matters a little bit more complicated, the hex colours don’t relate to unique colour names. For example, "Titanium White"
and "Liquid Clear"
are both represented by "#FFFFFF"
. Since we’ll use the hex colours for plotting, these two different colour names will be represented the same way in the plot. So we add together the percentages for these colour names i.e. group by the hex colour.
1 2 3 4 5 6 7 |
colour_count_data <- count_data |> left_join(colour_lookup, by = "colors") |> mutate(perc = round(100 * n / sum(n))) |> select(color_hex, perc) |> group_by(color_hex) |> summarise(perc = sum(perc)) |> ungroup() |
One issue with this approach is that we’ve rounded the percentages. For a waffle chart, we have 100 squares and each square will take a single colour representing 1% - there are no half squares. We need a whole number in the percentage for each category (hex colour). Unfortunately, calculating exact percentages and then rounding them doesn’t always add up to 100. Depending on the rounding, you might end up with 99 or 101 squares, for example. Let’s check if this adds up to 100 by chance:
1 |
sum(colour_count_data$perc) == 100 |
which returns:
1 |
TRUE |
This is extremely lucky!
If you’re creating your own waffle chart and these numbers don’t add up to 100, you’ll need to make a choice about how you’ll round the values.
Our data currently has one row per colour but for plotting later, we’ll need one row per square in the waffle chart. We can use the uncount()
function from {dplyr} to create replicates of each row, according to the number in the perc
column.
When we’re using the {roughsf} package, the aesthetics will also need to be specified explicitly as columns. This means we also need to rename the color_hex
column as fill
.
1 2 3 |
plot_data <- colour_count_data |> uncount(perc) |> rename(fill = color_hex) |
Let’s see what plot_data
looks like by inspecting the first few rows:
1 |
head(plot_data) |
which returns:
1 2 3 4 5 6 7 8 9 |
# A tibble: 6 × 1 fill <chr> 1 #000000 2 #000000 3 #000000 4 #000000 5 #000000 6 #000000 |
Making {sf} objects
Since we’re going to be plotting using {roughsf}, we need to make some objects to plot using {sf}. A waffle chart that displays percentages typically shows 100 squares arranged in a 10x10 grid. So we need to make 100 squares using {sf}.
Let’s just start with 1 square at a time. Since we’re going to be doing something 100 times, it’s best to make a function. The function below, make_square()
, takes three arguments: the x and y coordinates of the bottom left corner of the square, and the width of the square. It creates a matrix where the first column contains the x-coordinates of the corners of the square, and the second column the y-coordinates. Although a square only has four corners, here we have five pairs of coordinates with the last row equal to the first. This ensures the polygon is closed (and can have a fill
colour). The st_polygon()
function from {sf} then converts this matrix to an {sf} object.
1 2 3 4 5 6 7 8 9 10 |
make_square <- function(x0, y0, width = 1) { sf::st_polygon( list( cbind( c(x0, x0 + width, x0 + width, x0, x0), c(y0, y0, y0 + width, y0 + width, y0) ) ) ) } |
Let’s check it works by creating a square that starts at x0 = 2
and y0 = 3
with a side of length 1
. We can plot our square to make sure it creates the shape we expect:
1 2 |
sq <- make_square(x0 = 2, y0 = 3, width = 1) plot(sq) |
It works! Now we need to run the function 100 times to create 100 squares. We’ll use map2()
from {purrr} to run it 100 times. For each square we want to vary the x and y coordinates, but keep the width constant. The x coordinates will run from 1 to 10 (repeated 10 times), as will the y coordinates (repeated 10 times each). We’ll set the width to 0.8
to leave a little bit of space between each square.
1 2 3 4 5 |
poly_list <- purrr::map2( .x = rep(1:10, times = 10), .y = rep(1:10, each = 10), .f = ~ make_square(.x, .y, width = 0.8) ) |
Now we can join this list of squares (polygons) to our plot_data
from earlier, and convert it to an sf
object - and do it all at the same time using st_sf()
from {sf}! We need to specify that the geometry comes from the list of squares we’ve created. We can again check it works by plotting our new sf
object:
1 2 |
plot_sf <- sf::st_sf(plot_data, geometry = poly_list) plot(plot_sf) |
Although the colours aren’t mapped quite right, this has correctly created our 10x10 grid of squares.
Plotting with {roughsf}
The {roughsf} package is reasonably straightforward to use as it has only one main function, roughsf()
, which takes an sf
object as input and outputs a plot. We can also set the width and height for the plot (in pixels):
1 2 3 4 5 |
roughsf::roughsf( plot_sf, width = 800, height = 800, ) |
If you’re using RStudio, you’ll notice that this is appears in the Viewer tab rather than the Plots tab since it’s an HTML widget.
There are different types of patterns that can be used to colour in the squares including "hachure"
, "solid"
, "zigzag"
, "cross-hatch"
, "dots"
, "dashed"
, or "zigzag-line"
. This needs to be specified as a column in the data called fillstyle
. The fillweight
column will also control how thick the lines are in the fill pattern (it should have values between 0 and 1, with 1 resulting in thicker lines). After adding these columns, we can then re-run the roughsf()
function above:
1 2 |
plot_sf$fillstyle <- "cross-hatch" plot_sf$fillweight <- 0.8 |
This is starting to look quite like what I was imagining, although it still doesn’t look quite sketchy enough for my liking. The roughness
and bowing
arguments in the roughsf()
function control the roughness and bowing (rounded-ness) of the lines in the plot. You might want to play around with these values to find something you like (larger numbers generally lead to rougher, more sketchy looking plots):
1 2 3 4 5 6 7 |
roughsf::roughsf( plot_sf, roughness = 3, bowing = 2, width = 800, height = 800, ) |
We can also add a title
and caption
to the plot. The font size and family can be specified using the title_font
and caption_font
arguments. Just like {ggplot2} plots, {roughsf} plots can be saved as objects in R. Let’s save this plot as rsf
:
1 2 3 4 5 6 7 8 9 10 11 |
rsf <- roughsf::roughsf(plot_sf, title = "The Colours of Bob Ross Paintings", title_font = "48px Pristina", caption = "Graphic: Nicola Rennie", caption_font = "30px Pristina", roughness = 3, bowing = 2, width = 800, height = 800, ) rsf |
You might also want to save the plot as a static image (e.g. as a PNG file). Luckily, the {roughsf} package comes with the save_roughsf()
function to make this easy. Simply save the {roughsf} plot as a variable (e.g. rsf
), and pass it in as the first argument to save_roughsf()
. The second argument is the file name (including the file extension) you’d like to save it as.
To make the white squares in the top row stand out a little bit more, we can also make the background a light beige by setting background = "#f5f5dc"
(rather than the default white background).
1 2 3 4 5 |
roughsf::save_roughsf( rsf = rsf, file = "bob_ross_waffle.png", background = "#f5f5dc" ) |
I hope this blog post has inspired you to create your own sketchy looking waffle charts, even if this is perhaps one of the most overly complicated ways of making a waffle chart in R. As long as you can break your chart down into lines, squares, and circles, you can create pretty much anything. Don’t get boxed in by using only what already exists!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.