Site icon R-bloggers

Advanced Plots with str_glue()

[This article was first published on Exploring Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • Source: https://bit.ly/2CGyS6I

    Quick Overview

    Exploring-Data is a place where I share easily digestible content aimed at making the wrangling and exploration of data more efficient (+fun).

    Sign up Here to join the many other subscribers who also nerd out on new tips and tricks ????

    And if you enjoy the post be sure to share it

    Let’s Dive Into an Example

    I’d like to share a technique using str_glue() that I learned from Matt Dancho, a Data-Science instructor at Business Science University. Check out my favorite course here: Business Analysis With R.

    str_glue() from the stringr library is one of my favorite functions in R – It’s super helpful for wrangling and manipulating text in preparation for building advanced plots.

    Let’s Get Some Data ????

    The Tidy Tuesday Project is an awesome repository of useful data for practicing your data Wrangling skills.

    We will work with the San Francisco Trees data as a case-study for using str_glue() for advanced plotting techniques.

    library(tidyverse)
    library(stringr)
    library(tidyquant)
    library(scales)
    library(DataExplorer)
    
    tuesdata <- tidytuesdayR::tt_load('2020-01-28') 
    sf_trees_data_raw_tbl <- tuesdata$sf_trees

    Data Exploration

    Let’s take a quick peak and inspect the SF Trees Data.

    plot_missing(
        sf_trees_data_raw_tbl,
        ggtheme = theme_tq(),
        title = str_glue(
        'Exploring Missing Data (N = {count(sf_trees_data_raw_tbl)})'))

    This is a fairly small data-set with only 12 columns. For the purpose of demonstration, let’s see what we can do with just the species column.

    The Coastal Redwoods in the SF area are incredible and one of my favorite species. I’m curious if other species of Redwoods are in SF and if so, at what proportions do they exist relative to the Coastal Redwoods.

    Data Wrangling

    # Data Wrangling
    redwood_tbl <- sf_trees_data_raw_tbl %>% 
        
        # select species and filter to redwood only
        select(species) %>% 
        filter(str_detect(species, pattern = 'Redwood')) %>% 
        
        # break up species and common-name into separate columns
        separate(
            col = species,
            into = c("species", "common_name"),
            sep  = ' :: ',
            remove = FALSE) %>% 
        
        # calculate absolute and relative distributions
        count(species, common_name) %>% 
        mutate(pct = n / sum(n),
               pct_text = percent(pct)) %>% 
        arrange(desc(pct))

    Summary Table

    Let’s take a look at our findings.

    rmarkdown::paged_table(redwood_tbl %>% select(-pct))

    As I expected, the Coastal Redwoods are the dominant species in San Francisco.

    And while the table is good, lets craft an awesome plot to display these results.

    The Power of str_glue()

    With a couple lines of code we can build our label for plotting. As you can see, we can add arguments directly from our table using curly brackets {} – honestly, the options are endless for how creative you can get with stringing together text and adding labels to your plots.

    # Data Wrangling
    redwood_labeled_text_tbl <- redwood_tbl %>% 
        
        # label text
        mutate(label_text = str_glue(
            'Scientific Name:
            {species}
            Count: {n} of {sum(n)}
            Pct (%): {pct_text}'),
        
        # add 'forward-slash' to wrap-text on our plot
        common_name = str_replace(common_name, pattern = ' ', 
                                  replacement = ' \n ')) %>% 
        
        # reorder factors based on percent rank
        mutate(common_name = common_name %>% fct_reorder(pct))

    Manipulated Text (ready to plot)

    Here is the manipulated text that will be useful once we plot these data; this setup is critical for the labels on our final plot e.g., the \n will wrap the text at those locations.

    redwood_labeled_text_tbl %>% 
        select(label_text) %>% 
        rmarkdown::paged_table()

    Data Visualization

    Now that we’ve done our Data Wrangling, lets get into a bit of Data Visualization.

    # Save Plot
    g <- redwood_labeled_text_tbl %>% 
        
        # Canvas
        ggplot(aes(pct, common_name), color = '#2c3e50') +
        
        # Geometries
        geom_segment(aes(xend = 0, yend = common_name), size = 2) +
        geom_point(aes(size = 5)) +
        geom_label(aes(label = label_text),hjust = "inward",size = 3) +
    
        # Formatting
        scale_x_continuous(labels = scales::percent_format()) +
        theme_tq() + 
        labs(
          title = str_glue("Redwoods Trees of San Francisco"),
          subtitle = str_glue(
            "As expected, the coastal Redwoods make up the largest proportion.
            Dawn Redwoods were once thought to be extinct (low #s not suprising).
            Siera Redwoods grow at high elev. and so low numbers are expected."),
          caption = str_glue("The Coastal Redwood is the dominant species in SF."),
          x = "", y = "") +
        theme(legend.position = "none",
              plot.title = element_text(face = "bold"))

    Display Awesome Plot

    And here is our plot with the engineered labels from the last few steps. And that’s just one example of why I love str_glue() – Simply Awesome!

    Wrap Up

    That’s it for today!

    We used str_glue() to manipulate our text and add awesome labels to our ggplot() – the plot is now Business-Ready ????

    Stay tuned for more posts on Wrangling + Exploring-Data with R.

    Get the code here: Github Repo.

    Subscribe + Share

    Enter your Email Here to get the latest from Exploring-Data in your inbox.

    PS: Be Kind and Tidy your Data ????

    Learn R Fast

    Interested in expediting your learning path? Head on over to Business Science and join me on the journey.

    Business Science: FREE Jumpstart Data-Science Course (opened for a limited time)

    To leave a comment for the author, please follow the link and comment on their blog: Exploring Data.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.