Site icon R-bloggers

Creating a Summoning Salt-style speedrun plot

[This article was first published on Higher Order Functions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A videogame speedrun is a challenge to beat the game as quickly as possible. It’s time attack racing but for a videogame. There are, in my mind, two ways to make a run’s time go faster: Playing better and more smoothly (optimizations, having better luck) and playing less of the game (better routing, new glitches/skips). The history of a speedrun category then is often an exciting mix of evolutionary improvements as players level up their skills and revolutionary jumps as players find new ways to cut through the game.

Summoning Salt is a Youtube creator who creates documentaries that trace out the world record progression in a speedrun. The videos are immensely enjoyable, as Salt dishes out the history bit by bit, record by record, sometimes in a suspenseful fashion.

As a data visualization person, I’ve noticed that Summoning Salt recently started to use a new prop in the videos: A step graph of the world record times. The graph is developed throughout a video as players (represented by individual colors) lower the times with new records (points) until you get a full reveal of a timeline like the following:

Screenshot of a timeline from a Summoning Salt video.

Let’s recreate this figure in R with ggplot2.

Warp pipe: Obtaining the data

The game in question is New Super Mario Bros Wii, and the record keeper is the site speedrun.com. There is not just one speedrun category for this game, so in particular, we want the “Any%” record history (i.e., “any percent”: you don’t have play every level, and you can skip parts of the game.)

We need to get the leaderboard history data from speedrun.com. There is an official REST API for the site’s data, but it’s not straightforward how to query it to obtain the data needed for a world record progression. (Apparently, one could request the leaderboard on different dates and work backwards through time.) But that’s okay, we are not going to use the API. Instead, the statistics page for the game has a plot that is tantalizingly close to the one we want to create.

A timeline figure from speedrun.com.

This plot is interactive, and our browser is downloading the data and plotting it for us. If we snoop around the page, we can find the JSON data behind the plot. In Firefox, when I right-click on the plot and hit “Inspect”, I see the HTML code that contains the plot. Just below the plot’s div is a chunk of Javascript.

A screenshot of the Firefox inspector showing the speedrun data in a Javascript script tag.

The first line of it is all the speedrun data that is being plotted. We save that JSON into its own file.

Ground pound: Filtering and cleaning the data

Let’s read the data into R. JSON is short for “Javascript Object Notation”, and it’s basically the equivalent of a list() in R. Hence, jsonlite provides a large, deeply nested list for us.

library(tidyverse)

# a helper function to download the data from github
# in case you want to play along
path_blog_data <- function(x) {
  file.path(
    "https://raw.githubusercontent.com",
    "tjmahr/tjmahr.github.io/master/_R/data",
    x
  )
}

json_runs <- path_blog_data("2022-05-23-nsmbw-runs.json") |> 
  jsonlite::read_json()

The plot on the statistics page has a dropdown menu for different kinds of records to display, so this JSON object has a sublist for each dropdown menu choice. What we want is the first sublist (full game runs) then its first sublist (with a label of "Any% - Physical") then its "data".

# Dropdown menu choices
str(json_runs, max.level = 1)
#> List of 10
#>  $ 0   :List of 7
#>  $ 6789:List of 18
#>  $ 6805:List of 18
#>  $ 6815:List of 18
#>  $ 6826:List of 19
#>  $ 6841:List of 18
#>  $ 6846:List of 20
#>  $ 6859:List of 19
#>  $ 6868:List of 22
#>  $ 6882:List of 18

# Full game run histories
str(json_runs[[1]], max.level = 2)
#> List of 7
#>  $ :List of 7
#>   ..$ label                    : chr "Any% - Physical"
#>   ..$ data                     :List of 30
#>   ..$ borderColor              : chr "#EE4444"
#>   ..$ pointBorderColor         : chr "#EE4444"
#>   ..$ pointHoverBackgroundColor: chr "#EE4444"
#>   ..$ hidden                   : logi FALSE
#>   ..$ steppedLine              : logi TRUE
#>  $ :List of 7
#>   ..$ label                    : chr "Cannonless - Physical"
#>   ..$ data                     :List of 25
#>   ..$ borderColor              : chr "#EF8241"
#>   ..$ pointBorderColor         : chr "#EF8241"
#>   ..$ pointHoverBackgroundColor: chr "#EF8241"
#>   ..$ hidden                   : logi FALSE
#>   ..$ steppedLine              : logi TRUE
#>  $ :List of 7
#>   ..$ label                    : chr "100% - Physical"
#>   ..$ data                     :List of 17
#>   ..$ borderColor              : chr "#F0C03E"
#>   ..$ pointBorderColor         : chr "#F0C03E"
#>   ..$ pointHoverBackgroundColor: chr "#F0C03E"
#>   ..$ hidden                   : logi FALSE
#>   ..$ steppedLine              : logi TRUE
#>  $ :List of 7
#>   ..$ label                    : chr "Any% No W5 - Physical"
#>   ..$ data                     :List of 22
#>   ..$ borderColor              : chr "#8AC951"
#>   ..$ pointBorderColor         : chr "#8AC951"
#>   ..$ pointHoverBackgroundColor: chr "#8AC951"
#>   ..$ hidden                   : logi TRUE
#>   ..$ steppedLine              : logi TRUE
#>  $ :List of 7
#>   ..$ label                    : chr "Low% - Physical"
#>   ..$ data                     :List of 18
#>   ..$ borderColor              : chr "#09B876"
#>   ..$ pointBorderColor         : chr "#09B876"
#>   ..$ pointHoverBackgroundColor: chr "#09B876"
#>   ..$ hidden                   : logi TRUE
#>   ..$ steppedLine              : logi TRUE
#>  $ :List of 7
#>   ..$ label                    : chr "Any% Multiplayer - Physical"
#>   ..$ data                     :List of 11
#>   ..$ borderColor              : chr "#44BBEE"
#>   ..$ pointBorderColor         : chr "#44BBEE"
#>   ..$ pointHoverBackgroundColor: chr "#44BBEE"
#>   ..$ hidden                   : logi TRUE
#>   ..$ steppedLine              : logi TRUE
#>  $ :List of 7
#>   ..$ label                    : chr "All Regular Exits - Physical"
#>   ..$ data                     :List of 7
#>   ..$ borderColor              : chr "#6666EE"
#>   ..$ pointBorderColor         : chr "#6666EE"
#>   ..$ pointHoverBackgroundColor: chr "#6666EE"
#>   ..$ hidden                   : logi TRUE
#>   ..$ steppedLine              : logi TRUE

# Just want the data field from the first one
json_any_percent <- json_runs[[1]][[1]][["data"]]

Here are the first two points’ worth of date. We have a not-so-obviously encoded date (x), the run length in seconds (y) and the player. We are going to convert each of these lists into a dataframe and bind them together.

json_any_percent |> head(2) |> str()
#> List of 2
#>  $ :List of 4
#>   ..$ x      : int 1306670400
#>   ..$ y      : int 1616
#>   ..$ players:List of 1
#>   .. ..$ : chr "RaikerZ"
#>   ..$ link   : chr "/nsmbw/run/2216987"
#>  $ :List of 4
#>   ..$ x      : int 1325246400
#>   ..$ y      : int 1549
#>   ..$ players:List of 1
#>   .. ..$ : chr "RaikerZ"
#>   ..$ link   : chr "/nsmbw/run/2216995"

data <- json_any_percent |> 
  lapply(
    # turn one list into a dataframe
    function(x) { 
      tibble(
        date = x$x, 
        run_time_s = x$y, 
        player = x$players[[1]]
      )
    }
  ) |> 
  bind_rows()

data
#> # A tibble: 30 × 3
#>          date run_time_s player       
#>         <int>      <dbl> <chr>        
#>  1 1306670400       1616 RaikerZ      
#>  2 1325246400       1549 RaikerZ      
#>  3 1332763200       1531 RaikerZ      
#>  4 1349870400       1527 RaikerZ      
#>  5 1457179200       1526 GreenUprooter
#>  6 1461585600       1523 Auchgard     
#>  7 1461672000       1522 Auchgard     
#>  8 1461758400       1519 Auchgard     
#>  9 1470744000       1514 Auchgard     
#> 10 1471521600       1512 Auchgard     
#> # … with 20 more rows

Lastly, we need to do something about those dates. When you see a date-time represented by a single large number, it’s probably a POSIX date representing the date-time as the number of seconds since some origin date-time (see also Unix Time). Using the default Unix origin time seems to give the correct date conversion:

data <- data |> 
  mutate(
    date_posix = as.POSIXct(date, tz = "UTC", origin = "1970-01-01")
  ) 

data
#> # A tibble: 30 × 4
#>          date run_time_s player        date_posix         
#>         <int>      <dbl> <chr>         <dttm>             
#>  1 1306670400       1616 RaikerZ       2011-05-29 12:00:00
#>  2 1325246400       1549 RaikerZ       2011-12-30 12:00:00
#>  3 1332763200       1531 RaikerZ       2012-03-26 12:00:00
#>  4 1349870400       1527 RaikerZ       2012-10-10 12:00:00
#>  5 1457179200       1526 GreenUprooter 2016-03-05 12:00:00
#>  6 1461585600       1523 Auchgard      2016-04-25 12:00:00
#>  7 1461672000       1522 Auchgard      2016-04-26 12:00:00
#>  8 1461758400       1519 Auchgard      2016-04-27 12:00:00
#>  9 1470744000       1514 Auchgard      2016-08-09 12:00:00
#> 10 1471521600       1512 Auchgard      2016-08-18 12:00:00
#> # … with 20 more rows

Triple jump: Plotting

First, let’s get the data on the panel. I could spend an endless amount of time tweaking or customizing a plot’s theme, so I do the styling last. Otherwise, styling would fill up all of the time I’ve set aside to work on the plot.

We want to draw a point for each particular record-setting event, and we want to draw a line that connects all of the points. geom_step() draws a line plot but it can move straight up/down or straight left/right—no diagonal lines—so it’s what we want. We also want to the color of these geometries to change with the record holder (player).

ggplot(data) + 
  aes(x = date_posix, y = run_time_s, color = player) + 
  geom_step() +
  geom_point()

Oops! It assumed that we wanted to connected the dots separate for each color. We set the group aesthetic to a constant value so there is only one line drawn.

ggplot(data) + 
  aes(x = date_posix, y = run_time_s, color = player) + 
  geom_step(aes(group = 1)) +
  geom_point()

Making the Summoning Salt version is just a matter of theming at this point. We use theme_void() to completely wipe out the current theme, and we hide the color legend.

ggplot(data) + 
  aes(x = date_posix, y = run_time_s, color = player) + 
  geom_step(aes(group = 1)) +
  geom_point() + 
  theme_void() +
  guides(color = "none")

Next, we are going to use the showtext package to obtain an 8-bit :

library(showtext)
#> Loading required package: syss
#> Loading required package: showtextdb
_add_google("Press Start 2P")
showtext_auto(TRUE)

The void provides nothing, so we have to specify the main colors, the axis lines, and the plotting margin. We also crank up the chroma values for intense colors on the black background.

ggplot(data) + 
  aes(x = date_posix, y = run_time_s, color = player) + 
  geom_step(aes(group = 1)) +
  geom_point() +
  guides(color = "none") + 
  scale_color_discrete(c = 255) +
  labs(title = "World Record Timeline") +
  theme_void(base_size = 20, base_family = "Press Start 2P") +
  theme(
    plot.title = element_text(color = "white", hjust = .5), 
    plot.background = element_rect(fill = "black"),
    axis.line = element_line(
      color = "white", 
      size = 1, 
      # more 8-bit looking lines
      lineend = "square"
    ), 
    plot.margin = margin(12, 12, 12, 12, "pt")
  ) 

To keep overlapping points from looking like blobs, we can use a filled point. We see the outline of the points to black and the fill to the player color. Then we have to make sure that guide for the fill doesn’t appear and that fill and color have the same color scale.

ggplot(data) + 
  aes(x = date_posix, y = run_time_s, color = player) + 
  geom_step(aes(group = 1)) +
  geom_point() +
  geom_point(
    aes(fill = player),
    shape = 21,
    color = "black", 
    size = 2
  ) +
  # no legend for fill
  guides(color = "none", fill = "none") + 
  # fill and color get same scale
  scale_color_discrete(c = 255, aesthetics = c("color", "fill")) +
  labs(title = "World Record Timeline") +
  theme_void(base_size = 20, base_family = "Press Start 2P") +
  theme(
    plot.title = element_text(color = "white", hjust = .5), 
    plot.background = element_rect(fill = "black"),
    axis.line = element_line(
      color = "white", 
      size = 1, 
      lineend = "square"
    ), 
    plot.margin = margin(12, 12, 12, 12, "pt")
  ) 

Finally, let’s make another version of this figure. How might we make a more accessible presentation of this information (of who held a record and when), assuming that we only have a static image? A legend with players/colors is a nonstarter. One idea would be to label the point with an annotation whenever there is a new record holder.

showtext_auto(FALSE)

data <- data |>
  # record whenever the title holder changes an "era"
  mutate(
    player2 = ifelse(
      player == "[gb/eng]FadeVanity", 
      "FadeVanity", 
      player
    ),
    change = player != lag(player) | is.na(lag(player)),
    era = cumsum(change)
  ) 

# I am going to hardcode some position adjustments for the labels
offsets <- c(1, 2, 1, 5, -1, 5, 4, 3, 2, 1, -4, -3, -2, -1)

data_lab <- data |> 
  group_by(era) |> 
  filter(run_time_s == max(run_time_s)) |> 
  ungroup() |> 
  mutate(offset = offsets)

ggplot(data) + 
  aes(x = date_posix, y = run_time_s, color = player) + 
  geom_step(aes(group = 1), size = 1) +
  geom_text(
    aes(
      label = player2,
      y = run_time_s + 30 * offset 
    ),
    hjust = 0,
    size = 4,
    data = data_lab
  ) +
  geom_segment(
    aes(
      # i.e., run the line up to .95 of the label's height
      yend = run_time_s + 30 * offset * .95, 
      xend = date_posix
    ),     
    data = data_lab, 
    linetype = "dashed"
  ) + 
  geom_point(size = 3) +
  # yes, I'm adding forty million seconds to the last datetime
  expand_limits(x = max(data$date_posix) + 4e7) +
  guides(color = "none") +
  scale_x_datetime(
    name = NULL,
    date_breaks = "2 years", 
    date_labels = "%Y"
  ) +
  scale_y_continuous(
    name = "World record",
    breaks = 21:27 * 60,
    labels = function(x) sprintf("%d:%02.f", x %/% 60, x %% 60)
  ) + 
  theme_minimal(base_size = 14) +
  theme(plot.margin = margin(12, 12, 12, 12, "pt"))


Last knitted on 2022-05-24. Source code on GitHub.1

  1. sessioninfo::session_info()
    #> ─ Session info ───────────────────────────────────────────────────────────────
    #>  setting  value
    #>  version  R version 4.2.0 RC (2022-04-21 r82226 ucrt)
    #>  os       Windows 10 x64 (build 22000)
    #>  system   x86_64, mingw32
    #>  ui       RTerm
    #>  language (EN)
    #>  collate  English_United States.utf8
    #>  ctype    English_United States.utf8
    #>  tz       America/Chicago
    #>  date     2022-05-24
    #>  pandoc   NA
    #> 
    #> ─ Packages ───────────────────────────────────────────────────────────────────
    #>  package     * version date (UTC) lib source
    #>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.2.0)
    #>  backports     1.4.1   2021-12-13 [1] CRAN (R 4.2.0)
    #>  broom         0.8.0   2022-04-13 [1] CRAN (R 4.2.0)
    #>  cachem        1.0.6   2021-08-19 [1] CRAN (R 4.2.0)
    #>  cellranger    1.1.0   2016-07-27 [1] CRAN (R 4.2.0)
    #>  cli           3.2.0   2022-02-14 [1] CRAN (R 4.2.0)
    #>  colorspace    2.0-3   2022-02-21 [1] CRAN (R 4.2.0)
    #>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.2.0)
    #>  curl          4.3.2   2021-06-23 [1] CRAN (R 4.2.0)
    #>  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.2.0)
    #>  dbplyr        2.1.1   2021-04-06 [1] CRAN (R 4.2.0)
    #>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.0)
    #>  downlit       0.4.0   2021-10-29 [1] CRAN (R 4.2.0)
    #>  dplyr       * 1.0.9   2022-04-28 [1] CRAN (R 4.2.0)
    #>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
    #>  evaluate      0.15    2022-02-18 [1] CRAN (R 4.2.0)
    #>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
    #>  farver        2.1.0   2021-02-28 [1] CRAN (R 4.2.0)
    #>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
    #>  forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.2.0)
    #>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
    #>  generics      0.1.2   2022-01-31 [1] CRAN (R 4.2.0)
    #>  ggplot2     * 3.3.6   2022-05-03 [1] CRAN (R 4.2.0)
    #>  git2r         0.30.1  2022-03-16 [1] CRAN (R 4.2.0)
    #>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
    #>  gtable        0.3.0   2019-03-25 [1] CRAN (R 4.2.0)
    #>  haven         2.5.0   2022-04-15 [1] CRAN (R 4.2.0)
    #>  here          1.0.1   2020-12-13 [1] CRAN (R 4.2.0)
    #>  highr         0.9     2021-04-16 [1] CRAN (R 4.2.0)
    #>  hms           1.1.1   2021-09-26 [1] CRAN (R 4.2.0)
    #>  httr          1.4.3   2022-05-04 [1] CRAN (R 4.2.0)
    #>  jsonlite      1.8.0   2022-02-22 [1] CRAN (R 4.2.0)
    #>  knitr       * 1.39    2022-04-26 [1] CRAN (R 4.2.0)
    #>  labeling      0.4.2   2020-10-20 [1] CRAN (R 4.2.0)
    #>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.2.0)
    #>  lubridate     1.8.0   2021-10-07 [1] CRAN (R 4.2.0)
    #>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
    #>  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.2.0)
    #>  modelr        0.1.8   2020-05-19 [1] CRAN (R 4.2.0)
    #>  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.2.0)
    #>  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.2.0)
    #>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
    #>  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.2.0)
    #>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
    #>  ragg          1.2.2   2022-02-21 [1] CRAN (R 4.2.0)
    #>  readr       * 2.1.2   2022-01-30 [1] CRAN (R 4.2.0)
    #>  readxl        1.4.0   2022-03-28 [1] CRAN (R 4.2.0)
    #>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.2.0)
    #>  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.2.0)
    #>  rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.2.0)
    #>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.2.0)
    #>  rvest         1.0.2   2021-10-16 [1] CRAN (R 4.2.0)
    #>  scales        1.2.0   2022-04-13 [1] CRAN (R 4.2.0)
    #>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
    #>  showtext    * 0.9-5   2022-02-09 [1] CRAN (R 4.2.0)
    #>  showtextdb  * 3.0     2020-06-04 [1] CRAN (R 4.2.0)
    #>  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.2.0)
    #>  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
    #>  syss    * 0.8.8   2022-03-13 [1] CRAN (R 4.2.0)
    #>  systems   1.0.4   2022-02-11 [1] CRAN (R 4.2.0)
    #>  textshaping   0.3.6   2021-10-13 [1] CRAN (R 4.2.0)
    #>  tibble      * 3.1.7   2022-05-03 [1] CRAN (R 4.2.0)
    #>  tidyr       * 1.2.0   2022-02-01 [1] CRAN (R 4.2.0)
    #>  tidyselect    1.1.2   2022-02-21 [1] CRAN (R 4.2.0)
    #>  tidyverse   * 1.3.1   2021-04-15 [1] CRAN (R 4.2.0)
    #>  tzdb          0.3.0   2022-03-28 [1] CRAN (R 4.2.0)
    #>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
    #>  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.2.0)
    #>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
    #>  xfun          0.31    2022-05-10 [1] CRAN (R 4.2.0)
    #>  xml2          1.3.3   2021-11-30 [1] CRAN (R 4.2.0)
    #>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.2.0)
    #> 
    #>  [1] C:/Users/Tristan/AppData/Local/R/win-library/4.2
    #>  [2] C:/Program Files/R/R-4.2.0rc/library
    #> 
    #> ──────────────────────────────────────────────────────────────────────────────
    

To leave a comment for the author, please follow the link and comment on their blog: Higher Order Functions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.