Site icon R-bloggers

Dot plots as an alternative to bar charts

[This article was first published on Albert Rapp, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

  • I recently saw a cool LinkedIn post where it was highlighted that a dot plot is a pretty neat alternative to bar charts. While bar charts makes it easy to compare categories, dot plots put more emphasis on individual data points. I thought the idea was quite cool. So let me show you how to do that in {ggplot2}. And if you want to watch the video version of this blog post, you can do that here:

    < section id="bar-charts-for-comparison" class="level1">

    Bar charts for comparison

    First, let us create a good old bar chart. To do so, we will need to create a data set. Let’s create a fake one. And let’s make this fake data set about returned items of a retailer.

    set.seed(345345)
    library(tidyverse)
    fake_dat <- tibble(
      reason_for_return = c(
        "Wrong Address",
        "Wrong Item",
        "Damaged",
        "Unhappy with the product", 
        "Other"
      ),
      returned_items = rpois(5, 100)
    )
    fake_dat
    ## # A tibble: 5 × 2
    ##   reason_for_return        returned_items
    ##   <chr>                             <int>
    ## 1 Wrong Address                        86
    ## 2 Wrong Item                          103
    ## 3 Damaged                              92
    ## 4 Unhappy with the product            113
    ## 5 Other                               104

    With this fake data, we can create a bar chart like we normally would. In case you haven’t done that before, I think bar charts are one of the most fundamental charts and I’ve put together a video on how to create this and other fundamental charts at

    Anyway, the bar chart is created with geom_col().

    fake_dat |> 
      ggplot(
        aes(
          y = reason_for_return, 
          x = returned_items
        )
      ) +
      geom_col(fill = 'dodgerblue4')

    To make sure that this looks nice, let us

    • reorder the categories by the number of returned items
    • Apply a theme_minimal() layer where we increase the base size
    • Add a title
    • Remove the axes descriptions
    sorted_dat <- fake_dat |> 
      mutate(
        reason_for_return = fct_reorder(
          reason_for_return, 
          returned_items
        )
      )
    
    sorted_dat |> 
      ggplot(
        aes(
          y = reason_for_return, 
          x = returned_items
        )
      ) +
      geom_col(fill = 'dodgerblue4') +
      theme_minimal(
        base_size = 16,
        base_family = 'Source Sans Pro'
      ) +
      labs(
        title = "Number of returned items by return reason",
        x = element_blank(),
        y =element_blank()
      )

    Finally, we can remove the x-axis expansion so that labels are closer to the bars. Also, we can remove the y-axis grid lines. They’re not of much use here.

    sorted_dat |> 
      ggplot(
        aes(
          y = reason_for_return, 
          x = returned_items
        )
      ) +
      geom_col(fill = 'dodgerblue4') +
      theme_minimal(
        base_size = 16,
        base_family = 'Source Sans Pro'
      ) +
      labs(
        title = "Number of returned items by return reason",
        x = element_blank(),
        y =element_blank()
      ) +
      scale_x_continuous(
        expand = expansion(mult = c(0, 0.05))
      ) +
      theme(
        panel.grid.major.y = element_blank(),
        plot.title.position = 'plot'
      )

    Sweet. A good old bar chart. It’s pretty easy to see that the most common reason that people return items is because they’re unhappy with the product. Now off to the dot plot.

    < section id="dot-plot-for-individual-points" class="level1">

    Dot plot for individual points

    Basically, creating the bar chart is only a matter of replacing geom_col() with geom_point().

    sorted_dat |> 
      ggplot(
        aes(
          y = reason_for_return, 
          x = returned_items
        )
      ) +
      geom_point(
        color = 'dodgerblue4',
        size = 8
      ) +
      theme_minimal(
        base_size = 16,
        base_family = 'Source Sans Pro'
      ) +
      labs(
        title = "Number of returned items by return reason",
        x = element_blank(),
        y =element_blank()
      ) +
      scale_x_continuous(
        expand = expansion(mult = c(0, 0.05))
      ) +
      theme(
        panel.grid.major.y = element_blank(),
        plot.title.position = 'plot'
      )

    That was easy. But now we should probably make a couple of tweaks. Namely, we can remove the x-axis minor grid lines (because there’s just too many grid lines) and change the x-axis expansion to make room for the labels.

    sorted_dat |> 
      ggplot(
        aes(
          y = reason_for_return, 
          x = returned_items
        )
      ) +
      geom_point(
        color = 'dodgerblue4',
        size = 8
      ) +
      theme_minimal(
        base_size = 16,
        base_family = 'Source Sans Pro'
      ) +
      labs(
        title = "Number of returned items by return reason",
        x = element_blank(),
        y =element_blank()
      ) +
      scale_x_continuous(
        expand = expansion(mult = c(0.25, 0.05))
      ) +
      theme(
        panel.grid.major.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        plot.title.position = 'plot'
      )

    Finally, all that’s left to do is to

    • add the labels to the points with geom_text().
    • remove the y-axis labels
    sorted_dat |> 
      ggplot(
        aes(
          y = reason_for_return, 
          x = returned_items
        )
      ) +
      geom_point(
        color = 'dodgerblue4',
        size = 8
      ) +
      geom_text(
        aes(
          label = reason_for_return
        ),
        hjust = 1,
        nudge_x = -0.5,
        size = 5,
        family = 'Source Sans Pro',
        face = 'bold'
      ) +
      theme_minimal(
        base_size = 16,
        base_family = 'Source Sans Pro'
      ) +
      labs(
        title = "Number of returned items by return reason",
        x = element_blank(),
        y =element_blank()
      ) +
      scale_x_continuous(
        expand = expansion(mult = c(0.25, 0.05))
      ) +
      theme(
        panel.grid.major.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        plot.title.position = 'plot',
        axis.text.y = element_blank()
      )

    And if we wanted to make sure that the labels are more legible, we could also add a white background to the labels. The easiest way to do that is via geom_richtext() from {ggtext}.


    Sidenote: {ggtext} is one of my most favorite ggplot extensions. If you’re looking for more extensions, you can check out one of my YT videos:


    sorted_dat |> 
      ggplot(
        aes(
          y = reason_for_return, 
          x = returned_items
        )
      ) +
      geom_point(
        color = 'dodgerblue4',
        size = 8
      ) +
      ggtext::geom_richtext(
        aes(
          label = reason_for_return
        ),
        hjust = 1,
        nudge_x = -0.5,
        size = 5,
        family = 'Source Sans Pro',
        face = 'bold'
      ) +
      theme_minimal(
        base_size = 16,
        base_family = 'Source Sans Pro'
      ) +
      labs(
        title = "Number of returned items by return reason",
        x = element_blank(),
        y =element_blank()
      ) +
      scale_x_continuous(
        expand = expansion(mult = c(0.25, 0.05))
      ) +
      theme(
        panel.grid.major.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        plot.title.position = 'plot',
        axis.text.y = element_blank()
      )

    But as you can see, this comes with a border around the labels. So that’s why you can set label.colour to NA.

    sorted_dat |> 
      ggplot(
        aes(
          y = reason_for_return, 
          x = returned_items
        )
      ) +
      geom_point(
        color = 'dodgerblue4',
        size = 8
      ) +
      ggtext::geom_richtext(
        aes(
          label = reason_for_return
        ),
        hjust = 1,
        nudge_x = -0.5,
        size = 5,
        family = 'Source Sans Pro',
        face = 'bold',
        label.colour = NA
      ) +
      theme_minimal(
        base_size = 16,
        base_family = 'Source Sans Pro'
      ) +
      labs(
        title = "Number of returned items by return reason",
        x = element_blank(),
        y =element_blank()
      ) +
      scale_x_continuous(
        expand = expansion(mult = c(0.25, 0.05))
      ) +
      theme(
        panel.grid.major.y = element_blank(),
        panel.grid.minor.x = element_blank(),
        plot.title.position = 'plot',
        axis.text.y = element_blank()
      )

    Nice. Notice how the grid lines don’t mess with the lebigility of the labels anymore. And with that we have finished our blog post for this week. If you found this helpful, here are some other ways I can help you:

    To leave a comment for the author, please follow the link and comment on their blog: Albert Rapp.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • Exit mobile version