Site icon R-bloggers

Cheesecake Diagrams: Pie Charts with a Different Flavour

[This article was first published on Having Fun and Creating Value With the R Language on Lucid Manager, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Part of my job at a regional water utility involves visualising operational data. We manage water and sewerage services for a large number of small and medium-sized towns in regional Victoria (Australia). Traditionally, performance reports consist of extensive tables filled with numbers with a line for each city. To make this data easier to consume, I developed the cheesecake diagram to spatially visualise performance data. A cheesecake diagram is just like a pie chart, but different.

Reporting performance spatially

Displaying geographic performance requires geometric objects at appropriate locations. The geographic bubble chart visualises a single quantitative variable through the size of the circles, plus a second qualitative variable through the colour.

The performance data is randomised for the purpose of this example. Performance is measured using four aggregated parameters. My paper on visualising water quality performance describes the algorithm in detail.

The size of the bubbles in the diagram below communicates the consumed volume of water. Due to the orders of magnitude difference in town size, the area is transformed with the square root. The colour of the bubble communicates a random level of performance.

In most traffic-light reporting systems, the colour for negative performance is red and excellent performance is green. This combination is, however, not useful for the 8% of men who struggle to see the difference between red and green. The RColorBrewer package provides several diverging colourblind-safe colour palates.

Visualising a single qualitative value with a bubble chart.

The first part of the code creates mock performance data. The geocode function of the ggmap package provides the latitude and longitude for each location. The consumption data is taken from the 2019 Coliban Water annual report. The data has a row for each location, the total consumption, four performance variables and the average over these four. The background map comes from Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under ODbL. The second part of the code projects the point geometry on the map and sets the scales and labels. The toner version of the Stamen maps is ideal for this visualisation because of its sparse background.

The next section shows how to split these bubbles into quadrants to separately visualise each variable.

  ## Cheesecake diagram
  library(tidyverse)
  library(ggmap)
  library(RColorBrewer)

  ## Register Google Maps API
  api <- readLines("google.api")
  register_google(key = api)

  ## Create mock performance data
  ## Towns with water treatment plants
  towns <- c("Bendigo", "Boort", "Bridgewater", "Castlemaine", "Cohuna", "Echuca", "Elmore", "Goornong", "Gunower", "Heathcote", "Korong Vale", "Kyneton", "Laanecoorie", "Leitchville", "Lockington", "Pyramid Hill", "Rochester", "Serpentine", "Trentham")
  t <- length(towns)

  ## Volume produced
  ## https://www.coliban.com.au/files/2019-10/FINAL_CW_AnnualReport2019_200919pm.pdf p. 24
  consumption <- c(11682, 138, 141, 2064, 610, 3017, 106, 44, 48, 243, 117, 862, 106, 161, 55, 84, 857, 17, 94)

  set.seed(1969)
  performance <- tibble(Town = towns) %>%
      bind_cols(geocode(paste(towns, "Victoria, Australia"))) %>%
      mutate(Consumption = consumption,
             Treatment = sample(0:100, t),
             Network = sample(0:100, t),
             Regulation = sample(0:100, t),
             Perception = sample(0:100, t),
             Performance = round((Treatment + Network + Regulation + Perception) / 4))

  ## Get map
  bbox <- make_bbox(range(performance$lon), range(performance$lat))
  map <- get_stamenmap(bbox, maptype = "toner-hybrid")

  ## Single variable
  ggmap(map, extent = "device") + 
      geom_point(data = performance,
                 aes(lon, lat, size = sqrt(Consumption), col = Performance),
                 alpha = 0.9) +
      scale_size_area(max_size = 24, guide = FALSE) +
      scale_color_gradientn(colors = brewer.pal(7, "RdYlBu")) +
      labs(title = "System Performance",
           subtitle = "Simulated data") +
      theme_void(base_size = 8)
  ggsave("../../static/images/hydroinformatics/bubble-chart-performance.png", width = 6, height = 6)

Introducing the Cheesecake Diagram

If we want to report more than one variable per location, the circle needs to he divided into two or more sectors. While this might sound like a pie chart, it is not. A pie chart visualises information through the size of the sectors in the diagram. The colours of the sectors communicate the data categories. Visualisation experts generally discourage using pie charts because they are not easy to interpret, but cheesecake diagrams are different.

A cheesecake diagram visualises information through the size of the circle and the colour of the sectors. The size of the sector is the same for each. A cheesecake diagram is a type of pie chart, but with a different flavour. The cheesecake diagram should not be used for more than four slices to ensure it remains readable.

The Scatterpie package provides functionality to plot pie charts on a map, but it does not allow you to link the colours to the aesthetics. The code below uses the powerful ggforce package to construct the cheesecakes from four circle sectors. To create the sectors, the performance data frame is pivoted, and each sector is defined by its starting and ending angle, being 90 degrees ($\pi/2$ radians). Because the circle now contains information, it is not sized to the level of consumption to keep the sectors clearly visible.

Cheesecake diagram with randomised variables.

  ## Cheesecake diagram
  library(ggforce)

  ## Convert data
  cheesecake = pivot_longer(performance, -1:-4, names_to = "Aspect", values_to = "Performance") %>%
      filter(Aspect != "Performance") %>%
      mutate(start = rep(seq(0, 2 * pi, length.out = 5)[-5], t),
             end = rep(seq(0, 2 * pi, length.out = 5)[-1], t))

  ## Visualise
  ggmap(map, extent = "device",
        base_layer = ggplot(data = cheesecake,
                            aes(x0 = lon,
                                y0 = lat,
                                r0 = 0,
                                r = .05,
                                start = start,
                                end = end,
                                fill = Performance))) +
      geom_arc_bar(col = "darkgrey", size = .1) +
      scale_size_area(max_size = 12, guide = FALSE) +
      scale_fill_gradientn(colors = brewer.pal(7, "RdYlBu")) +
      labs(title = "System Performance",
           subtitle = "Simulated data")  +
    theme_void(base_size = 6)
  ggsave("../../static/images/hydroinformatics/cheesecake-performance.png", width = 4, height = 4)

Data Science for Water Professionals

If you like to know more about using R to analyse water data, then onsider following the course Data Science for Water Utility Professionals on LeanPub.

Data Science for Water Utility Professionals, LeanPub.

Managing reliable water services requires not only a sufficient volume of water, but also large amounts of data. This course teaches the basics of data science using the R language and the Tidyverse libraries to analyse water management problems.

To leave a comment for the author, please follow the link and comment on their blog: Having Fun and Creating Value With the R Language on Lucid Manager.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.