Site icon R-bloggers

Your first D3 visualisation with {r2d3} and Scooby-Doo

[This article was first published on The Jumping Rivers Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


  Get the code for this blog on GitHub


What is this tutorial and who is it for?

This tutorial is aimed mainly at R users who want to learn a bit of D3, and specifically those who are interested in how you can incorporate D3 into your existing workflows in RStudio. It will gloss over a lot of the fundamentals of D3 and related topics (JavaScript, CSS, and HTML) to fast-forward the process of creating your first D3.js visualisation. It will therefore be far from a comprehensive guide. I’ve tried to include what I think is important, but if you have absolutely no experience with any of those topics you will almost definitely be left with some questions. Hopefully, the satisfaction of creating your first plot will inspire you to break and tweak the code I have provided to learn more.

What is D3?

D3.js, or just D3 as it’s more often referred to, is a JavaScript library used for creating interactive data visualisations optimised for the web. D3 stands for Data-Driven Documents. It is commonly used by those who enjoy making creative or otherwise unusual visualisations as it offers you a great deal of freedom as well as options for interactivity such as animated transitions and plot zooming.

Why should I care?

One benefit of D3 is its aforementioned creative control. Another benefit is that rather than creating raster images (e.g. PNG, JPEG) like a lot of plotting libraries it renders your figures as SVGs (scalable vector graphics), which stay crisp no matter how far you zoom in and are generally faster to load (note: when there are many data points, an SVG may be slower than a raster image, learn more about which image file type to use in our blog post on image formats). If you are an R user, you should also care because the {r2d3} package lets you easily incorporate D3 visualisations into your R workflow, and use them in e.g. R Markdown reports or R Shiny dashboards.

Is learning D3 worth the effort?

The short answer is: it depends. It can be quite tricky and time-consuming to learn D3 and all associated skills (JavaScript, HTML, CSS) if you have no previous experience. On the other hand, learning D3 can be a fun way to take your first steps into web development technologies. Furthermore, you may be perfectly happy with available plotting libraries in R, e.g. {ggplot2}, as what they offer is indeed highly flexible and suitable for interactivity. You can even save ggplot plots as SVG with ggsave() and svglite. Therefore, I don’t think learning D3 is a necessity for data visualisation, but it can be an addition to your skill set and can be a great first step into creative coding or web development.

What is {r2d3}?

If you are still with me, let’s get into {r2d3}. {r2d3} is an R package that lets you create D3 visualisations with R. One way it enhances this process is by being able to translate between R objects and D3-friendly data structures. This means that you can clean your data in R, and then just plot it using D3 without having to go near any data wrangling using JavaScript. Another cool feature is that you can create D3-rendering chunks in an R Markdown file that will preview inline, so you can easily incorporate a D3 visualisation in your reports. You can also easily add a D3 visualisation to a Shiny app using the renderD3() and d3Output() functions. If you need help with a Shiny Application, we can help.

The basics

OK, let’s get set up to create our first D3 visualisation in RStudio. We’re gonna be using this fun dataset on Scooby-Doo manually aggregated by user plummeye. We are gonna make a line chart that shows the cumulative total number of monsters caught by each member of Mystery Incorporated. Then we will add some unique D3 flair to it to make an unusually painful line chart worth it.

First, you’ll need to install the {r2d3} package as usual.

install.packages("r2d3")

This allows you to write D3 in RStudio in two main ways:

For this blog post, we will be writing our code in a separate .js file, but we will be running it in an R Markdown chunk to preview it (However, it is also possible to preview your code from the script directly, but this way will hopefully show you how easily you can include D3 visualisations in an R Markdown report).

So, we will start by creating two files:

To ensure that the files are able to interact with each other, I recommend working in an RStudio project (File > New Project) with both files at the .Rproj level.


Do you need help with your Shiny app? We can advise or even take over the day-to-day running of your application services


Data cleaning in R

You will need to install some packages for the cleaning steps, which you can install with this line of code:

install.packages(c("dplyr", "lubridate", "r2d3", 
                   "stringr", "tidyr", "tidytuesdayR"))

In your .Rmd file, you can copy the following steps to load necessary packages, read in the data, and clean it in preparation of our D3 visualisation. We won’t go through these steps as this blog post assumes you know R and some basic Tidyverse already! If you don’t, we offer courses to help you get started! You can download the data we will be using manually from here if you prefer reading it in from a CSV file.

# in scoobydoo.Rmd
library("dplyr")
library("tidyr")
library("stringr")
library("lubridate")

# load data from tidytuesday
tuesdata = tidytuesdayR::tt_load(2021, week = 29)
scoobydoo = tuesdata$scoobydoo

# wrangling data into nice shape
monsters_caught = scoobydoo %>%
  select(date_aired, starts_with("caught")) %>%
  mutate(across(starts_with("caught"), ~ as.logical(.))) %>%
  pivot_longer(cols = caught_fred:caught_not,
               names_to = "character",
               values_to = "monsters_caught") %>%
  drop_na()  %>%
  filter(!(character %in% c("caught_not", "caught_other"))) %>%
  mutate(year = year(date_aired), .keep = "unused") %>%
  group_by(character, year) %>%
  summarise(caught = sum(monsters_caught),
            .groups = "drop_last") %>%
  mutate(
    cumulative_caught = cumsum(caught),
    character = str_remove(character, "caught_"),
    character = str_to_title(character),
    character = recode(character, "Daphnie" = "Daphne")
  )

I recommend investigating the resulting columns of the data by printing monsters_caught at this stage, as it will help you better understand the D3 code later on. You will see that there are 5 columns, character which contains the names of our Mystery Inc. members (Daphne, Fred, Scooby, Shaggy, and Velma); year which contains years between 1969 and 2021 obtained from when the episode was aired; caught which contains how many monsters were caught for each mystery member in each year and cumulative_caught which is the cumulative sum of monsters caught for each member.

We are going to add a final column which will contain a unique colour for each character, so that our line chart will look a bit nicer. The colours are represented by hex codes obtained from official artwork of the characters.

# setting up colors for each character
character_hex = tribble(
  ~ character, ~ color,
  "Fred", "#76a2ca",
  "Velma", "#cd7e05",
  "Scooby", "#966a00",
  "Shaggy", "#b2bb1b",
  "Daphne", "#7c68ae"
)

monsters_caught = monsters_caught %>% 
  inner_join(character_hex, by = "character")

We will also add a new chunk which includes the following code:

library("r2d3")
r2d3(data = monsters_caught,
     script = "scoobydoo.js",
     d3_version = "5")

The r2d3() function lets you communicate with our scoobydoo.js script using the monsters_caught tibble that we’ve created in R. As our script is currently empty, nothing shows up when you run this line. After we add some new code to our scoobydoo.js script we can go back to scoobydoo.Rmd and re-run this line to view the output. We are specifying our D3 version as 5 to ensure our code will continue to work despite potentially breaking updates to D3.

Your first lines of D3

Okay, let’s add some code to our D3 script. We are defining some variables as constants that set up the size of our margins, plot width and height, and some and line sizes for later on. Defining our constants at the top makes them easy to find and change if we want to change the sizes throughout our script.

Note: Comments in JavaScript are denoted by //, and variable names are often written in camelCase.

Another important concept being introduced in the code below are attributes. An SVG element has a number of properties and these can be set as attributes. For example, here we are setting the width attribute of the SVG as the width of our (upcoming) plot plus the left and the right margin (white space around the plot). Finally, we set up a group that will represent the plot inside our SVG element, and then move this plot to start where the left and top margin end using the “transform” attribute.

// in scoobydoo.js

// set up constants used throughout script
const margin = {top: 80, right: 100, bottom: 40, left: 60}
const plotWidth = 800 - margin.left - margin.right
const plotHeight = 400 - margin.top - margin.bottom

const lineWidth = 3
const mediumText = 18
const bigText = 28

// set width and height of svg element (plot + margin)
svg.attr("width", plotWidth + margin.left + margin.right)
   .attr("height", plotHeight + margin.top + margin.bottom)
   
// create plot group and move it
let plotGroup = svg.append("g")
                   .attr("transform",
                         "translate(" + margin.left + "," + margin.top + ")")

If we run our r2d3() line in R Markdown again, the output is still empty, but if we right-click on the space below our chunk and click “Inspect Element”, we can now see that there is indeed an SVG element (everything inside the SVG tags <svg> </svg>), with the width and height that we’ve provided in the SVG attributes. Getting comfortable with using either the RStudio Developer Tools to inspect the element, or inspecting it in a browser, will help you more easily understand D3 visualisations.

Adding axes

Next, let’s create some axes. At the bottom of scoobydoo.js we add the lines defining the , add the following lines which define two functions xAxis and yAxis. These will be used to scale our data to a coordinate system.

// x-axis values to year range in data
// x-axis goes from 0 to width of plot
let xAxis = d3.scaleLinear()
    .domain(d3.extent(data, d => { return d.year; }))
    .range([ 0, plotWidth ]);
    
// y-axis values to cumulative caught range
// y-axis goes from height of plot to 0
let yAxis = d3.scaleLinear()
    .domain(d3.extent(data, d => { return d.cumulative_caught; }))
    .range([ plotHeight, 0]);

We set the limits of the x- and y-axes to be between the min and max of the respective columns (returned by d3.extent with an anonymous function returning all values from our respective columns). We then define the actual length of our axes to be our full plot width and plot height. Notice that when we define the y-axis, it is defined from top to bottom (from plot height to 0).

Then, let’s add these axes to the plot. We move the x axis to start at the bottom of the plot, and define it with a built-in D3 function used to create a bottom horizontal axis (d3.axisBottom) and a left vertical axis (d3.axisLeft) which require a scale (which we created with d3.scaleLinear in our xAxis and yAxis functions). We also set stroke widths and sizes for both axes.

// add x-axis to plot
// move x axis to bottom of plot (height)
// format tick values as date (no comma in e.g. 2,001)
// set stroke width and  size
plotGroup.append("g")
   .attr("transform", "translate(0," + plotHeight + ")")
   .call(d3.axisBottom(xAxis).tickFormat(d3.format("d")))
   .attr("stroke-width", lineWidth)
   .attr("-size", mediumText);

// add y-axis to plot
// set stroke width and  size
plotGroup.append("g")
    .call(d3.axisLeft(yAxis))
    .attr("stroke-width", lineWidth)
    .attr("-size", mediumText);
< svg id="scooby1">

Adding lines

Now, we need reformat our data slightly to be able to create a line chart with multiple lines. Each line will represent a Mystery Inc. member, so we want to create a hierarchical tree structure with the data for each character nested inside a separate key.

// turns data into nested structure for multiple line chart
// d3.nest() no longer available in D3 v6 and above hence version set to 5
let nestedData = d3.nest()
    .key(d => { return d.character;})
    .entries(data);

Here, d => {return d.character} defines an anonymous function which takes our data as an input and iterates through the character column so we can create a separate key for each character with key(). We then supply the data values associated with that character inside the key inside entries(). You can investigate the structure of the nested data by running nestedData in the JavaScript console when in “Inspect Element” mode.

Then, we create a path element which will have new class defined by us called drawn_lines (we can create a new class called whatever we want in the class attribute) so that we can access this specific path element later on. We define another anonymous function to color the line by the hex codes in our color column. Finally, we define how we want the path to use our data (it will be a line (d3.line) whose x position is determined by our year column, and y position by our cumulative_caught column)

let path = plotGroup.selectAll(".drawn_lines")
    .data(nestedData)
    .enter()
    .append("path")
    // set up class so only this path element can be removed
    .attr("class", "drawn_lines")
    .attr("fill", "none")
    // color of lines from hex codes in data
    .attr("stroke", d => {return d.values[0].color}) 
    .attr("stroke-width", lineWidth)
     // draw line according to data
    .attr("d", d => {
      return d3.line()
        .x(d => { return xAxis(d.year);})
        .y(d => { return yAxis(d.cumulative_caught);})
        (d.values)
    })
< svg id="scooby2">

Adding text

Now we will add a plot title. Create a text element for the plot title, defining where it is anchored, the x and y position of the anchor, what the actual text says, and its color, size and weight. We append the text to the whole svg, rather than just the plot. So that the title is above the tallest point of the y axis (end of the plotGroup).

// create plot title
svg.append("text")
   .attr("text-anchor", "start")
   .attr("x", margin.left)
   .attr("y", margin.top/3)
   .text("Monsters caught by Mystery Inc. members")
   .attr("fill", "black")
   .attr("-size", bigText)
   .attr("-weight", "bold")

Now we’ll create legend labels for each line which will identify which character each line belongs to. Here, we create another group in our plot that is going to contain text from nestedData. We set some attributes in terms of how it will look, as well as give it a custom class name_labels. We also decide where these labels will go, giving them an x position slightly after the last data point on the x axis (2021) and a y position based on the location of the final value on the y axis (where the line ends). The text and color of the label will depend on the character and color columns in the dataset.

// create legend labels i.e. character names
plotGroup.append("g")
  .selectAll("text")
  .data(nestedData)
  .enter()
  .append("text")
  // add class so name_labels can be removed in drawLines()
  .attr("class", "name_labels")
  .style("-weight", "bold")
  .style("-size", mediumText)
  // set location for labels (at the end)
  .attr("x", xAxis(2021) + mediumText/2)
  .attr("y", (d, i) => yAxis(d.values[d.values.length-1].cumulative_caught) + mediumText/3)
  .attr("fill", d => {return d.values[0].color})
  .text(d => {return d.values[0].character})
< svg id="scooby3">

Adding transitions

First, we will add a transition for the labels we just created. By wrapping our plot-creating code in functions we can recreate the plot at specific times. We will start by wrapping everything in the previous chunk inside a function called drawLabels() and add a transition which makes the labels appear after 500 milliseconds, giving them a “fade in” effect.

function drawLabels() {
  <insert code from previous chunk in here>
  .attr("opacity", 0)
  .transition()
  .duration(500)
  .attr("opacity", 1)
}

We are also gonna create a transition for the lines that makes them appear as if they’re being drawn from the start to end. Unfortunately, the easiest way to do this involves some trickery involving the stroke-dasharray attribute of each line. This attribute defines the dashed pattern of a line. So far, the lines on our plot are completely solid. We will introduce a dash so large that the length of the dash and the gap between each dash is longer than the width of the plot itself. We then manipulate the offset of the dashes to make it appear that the line is growing over time.

To do this, we need to create two functions. The first, tweenDash() returns a function to take the stroke-dasharray attribute of a line as an argument, then manipulate it to get the next “frame” of the animation. This will keep looping until the dash is covering the entire length of the line, making it visible. And it will take 2500ms to do this, as defined by duration(2500).

The other function, lineTransition(), takes a path (i.e. line) as an argument and passes that path’s stroke-dasharray attribute into the function returned by tweenDash(). It then applies the new dash configuration to the path. Note that when the transition ends (.on("end", ...)), our drawLabels function is called. This is to ensure that the labels appear only when the lines have fully appeared.

function tweenDash() {
  let l = this.getTotalLength(),
      i = d3.interpolateString("0," + l, l + "," + l);
  return function(t) { return i(t) };
}

function lineTransition(path) {
  path.transition()
      .duration(2500)
      .attrTween("stroke-dasharray", tweenDash)
      .on("end", () => { 
        drawLabels();
      });
}

Now, wrap your line-drawing code (the code chunk starting with let path =) in a new function called drawLines(). We add two new lines at the top which removes any previously drawn lines and labels. We chain on a call to the lineTransition() function at the end of our path code.

function drawLines() {
  // remove previously drawn lines when re-drawing
  plotGroup.selectAll(".drawn_lines").remove()
  // remove labels e.g. "Daphne" when re-drawing
  plotGroup.selectAll(".name_labels").remove()
  
  <code which starts with 'let path =' goes here>
    .call(lineTransition)
}

Finally, add a line to call our new drawLines() function at the bottom of the script.

drawLines()
< svg id="scooby_final">

Now we have a working, animated D3 visualisation! I’ve added a button to the blogpost to redraw the plot, but you should see the graph animate as you re-run your r2d3() line.

Make it resizable

You might’ve already noticed that your local plot is of a static size and if you resize your RStudio window, your plot gets cut off. Luckily, {r2d3} comes with built-in width and height objects that change based on the size of the plot container. This means that we can use these variables to make our plot flexibly resize as we resize the window.

If we want to keep similar dimensions between the margins, plot width and height and line and text sizes, you can replace your constant-defining code at the top with the following, but you can play around with the multipliers to determine what relationships you want between sizes.

const margin = {top: 0.1 * width, 
                right: 0.125 * width, 
                bottom: 0.05 * width, 
                left: 0.075 * width}
const plotWidth = width - margin.left - margin.right
const plotHeight = height - margin.top - margin.bottom

const lineWidth = 0.004 * plotWidth
const mediumText = 0.03 * plotWidth
const bigText = 0.04 * plotWidth

Now, if you re-run your plot, it should automatically resize when you change the size of the window. And notice, because the plot is an SVG (scalable vector graphics) element, our plot stays sharp as we make it bigger or smaller.


  Get the final .Rmd and .js files


Summary

We’ve now created our first D3 visualisation from scratch using the {r2d3} package in RStudio! As you can see, creating a line chart with many lines requires a lot of code and so, if you’re creating a basic plot for non-aesthetic purposes, sticking to {ggplot2} may make more sense. However, if you want your plot to be an interactive website statement piece or a creative, user-driven exploration of data or ideas, D3 may better suit your needs. As this blogpost was aimed at beginners, the end result is not particularly dramatic, but if this has inspired you to learn more, I have provided some links to some amazing D3 creators and resources below.

Further resources

If you are looking for more comprehensive materials to learn D3, I highly recommend these two video tutorials by Curran Kelleher: Data Visualization with D3.js and Data Visualization with D3, JavaScript, React. Moreover, the The D3.js Graph Gallery by Yan Holtz is a good reference website to see what kind of plots you can make and how. Check out Observable for plenty of creative community-made D3 visualisations. Finally, if you need to be convinced that you can make cool stuff in D3, I highly recommend checking out Shirley Wu, Nadieh Bremer, and Amelia Wattenberger.


For updates and revisions to this article, see the original post

To leave a comment for the author, please follow the link and comment on their blog: The Jumping Rivers Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.