Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Back in April 2022, I participated in the #30DayChartChallenge and for the Storytelling prompt on day 29 in the Uncertainty category, I created the Goldilocks Decision Tree. I also gave a talk to R-Ladies Nairobi on the challenge and used the flowchart as a live-coding example. A summary of the talk ended up as a blog post.
The reactions were pretty positive, and the suggestions from Twitter that it should become its own R package have been floating around in my mind since then. So here it is! {ggflowchart} – the package for creating simple flowcharts using {ggplot2}.
What does {ggflowchart} do? < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">
Flowcharts can be a useful way to visualise complex processes. However, I couldn’t find an easy way to create a flowchart in R. There are a few packages for either drawing basic components of flowcharts (like {grid}), packages that are great for visualising complex network data where order doesn’t really matter (like {ggnetwork} and {igraph}), but none of them gave me the control over customisation I was used to with {ggplot2}.
{ggflowchart} tries to fill that gap. The aim of {ggflowchart} is to help R users make simple, good-looking flowcharts, with as little code as possible. It computes a layout, then uses existing {ggplot2} functions to stitch together rectangles, text, and arrows.
Installing {ggflowchart} < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">
As of 11 May 2023, {ggflowchart} is officially available on CRAN. You can install {ggflowchart} using install.packages("ggflowchart")
.
You can also install the development version from GitHub:
1 |
remotes::install_github("nrennie/ggflowchart") |
At the moment, {ggflowchart} has reasonably few dependencies (most of them common R packages you’re already likely to have installed if you’ve been working with {ggplot2}).
- dplyr,
- igraph,
- ggplot2,
- rlang,
- tibble,
- tidyr
A few examples < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">
To show you how {ggflowchart} actually works, let’s go through a couple of small examples. The examples explained here are also included in the vignettes for future reference.
1 |
library(ggflowchart) |
The simplest flowchart, which may well be all you need, takes only a tibble (or data frame) containing two columns. One column detailing where an edge in the flowchart begins, and the second column detailing where it ends. Ideally, you’ll name these columns "from"
and "to"
but if not, {ggflowchart} assumes the first two columns of your tibble relate to the start and end points of edges.
1 2 |
data <- tibble::tibble(from = c("A", "A", "A", "B", "C", "F"), to = c("B", "C", "D", "E", "F", "G")) |
To construct the flowchart, you simply pass in the data frame of edges to the ggflowchart()
function.
1 |
ggflowchart(data) |
The ggflowchart()
function does have a few additional arguments that you can use to change the appearance of the flowchart, shown below with their default values.
node_data = NULL
. An optional data frame specifying node attributes, including labels to appear in the boxes.fill = "white"
. The fill colour of the node boxes.colour = "black"
. The outline colour of the node boxes.text_colour = "black"
. The colour of the text in the node boxes.text_size = 3.88
. The size of the text in the node boxes.arrow_colour = "black"
. The colour of the arrows between the node boxes.arrow_size = 0.3
. The size of the arrows between the node boxes.family = "sans"
. The family of the text in the node boxes.x_nudge = 0.35
. The width of the node boxes.y_nudge = 0.25
. The height of the node boxes.horizontal = FALSE
. The direction of the flowchart.
Additional arguments color
, text_color
, and arrow_color
allow alternate spellings of colour.
For example, we may wish to switch to a red flowchart, with a sans serif , and slightly more square boxes.
1 2 3 4 5 6 |
ggflowchart(data, colour = "red", text_colour = "red", arrow_colour = "red", family = "serif", x_nudge = 0.25) |
The fill
colour and text_colour
can also be changed based on the name of a column in the node_data
data frame. For example:
1 2 3 4 5 6 |
node_data <- tibble::tibble( name = c("A", "B", "C", "D", "E", "F", "G"), type = c("Type 1", "Type 1", "Type 1", "Type 1", "Type 2", "Type 2", "Type 2") ) ggflowchart(data, node_data, fill = type) |
The column names to colour by can be either quoted or unquoted, e.g. ggflowchart(data, node_data, fill = "type")
will produce the same result. Column names take priority over colour names. So if you have a column in node_data
called "blue"
– it will use the values in that column rather than colouring all nodes blue.
A more complex example showing how to change the layout using scale_x_reverse()
, add titles, change the background colour, and edit the labels is included in another
vignette.
What’s coming next? < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">
{ggflowchart} is currently a work in progress, and there are already a few things on my list to add into the next release! Upcoming features will include:
- Changing
linetype
andarrow_colour
of arrows based on edge attributes. - Changing
colour
of the node outline based on node attributes. - Arrows between nodes on the same level.
- Ability to define custom layouts.
Some of these features are easier than others, and some will require a bit of thought about design choices. Changing the arrow colours and node outline colours is a little bit tricky because we may end up with three separate colour
scales here. Layouts are currently based on the tree output from {igraph} but users may wish to move some of the nodes around. Hopefully soon users could define their own x
and y
coordinates to position the boxes. Arrows between boxes on the same level, e.g. between E
and F
in the minimal example are currently a bit hit or miss…
1 2 3 |
data <- tibble::tibble(from = c("A", "A", "A", "B", "C", "F", "E"), to = c("B", "C", "D", "E", "F", "G", "F")) ggflowchart(data) |
These features are currently listed as issues on GitHub, and I will slowly work my way through them. If you have other suggestions for new features, or if you find a bug, please create an issue in the GitHub repository.
The most important part… < svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg"> < path d="M0 0h24v24H0z" fill="currentColor"> < path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z">
Of course, the most important part of any R package is the hex sticker! As a nod to the Goldilocks Decision Tree flowchart that inspired the package in the first place, the hex sticker for {ggflowchart} features three bears!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.