#TidyTuesday 2023 – Week 31
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
The #TidyTuesday weekly challenge is organised by the R4DS (R for Data Science) Online Learning Community.
Every tuesday throughout the year, participants work on a common dataset and share the plots they create.
The dataset for this challenge comes from Wikipedia articles.
Getting the data
First of all, let’s load the packages we’ll be using :
{tidyverse} to clean the data and create the plots
{showtext} to change the fonts used
{ggtext} to add colours in the plot title
If you don’t have these packages installed, simply use the install.packages()
function.
# Load the packages library(tidyverse) library(showtext) library(rgdal) library(rgeos) library(ggtext)
We also load the fonts we will use in the plots: Roboto Condensed for the text and Bangers for the title.
# Import the fonts font_add_google("Roboto Condensed", "Roboto Condensed") font_add_google("Bangers", "Bangers") showtext_auto()
We can now download the dataset :
# Download the dataset states <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-08-01/states.csv')
We also download the data to plot U.S. states as hexagons here. We save the GEO JSON file in a raw/ directory.
We then load the data:
# Download the dataset us_hex <- readOGR("raw/us_states_hexgrid.geojson")
For a quick overview of the data, we use the glimpse()
function from the {dplyr} package:
# Explore the dataset glimpse(states)
Rows: 50 Columns: 14 $ state <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "Calif… $ postal_abbreviation <chr> "AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "F… $ capital_city <chr> "Montgomery", "Juneau", "Phoenix", "Little Rock", … $ largest_city <chr> "Huntsville", "Anchorage", "Phoenix", "Little Rock… $ admission <date> 1819-12-14, 1959-01-03, 1912-02-14, 1836-06-15, 1… $ population_2020 <dbl> 5024279, 733391, 7151502, 3011524, 39538223, 57737… $ total_area_mi2 <dbl> 52420, 665384, 113990, 53179, 163695, 104094, 5543… $ total_area_km2 <dbl> 135767, 1723337, 295234, 137732, 423967, 269601, 1… $ land_area_mi2 <dbl> 50645, 570641, 113594, 52035, 155779, 103642, 4842… $ land_area_km2 <dbl> 131171, 1477953, 294207, 134771, 403466, 268431, 1… $ water_area_mi2 <dbl> 1775, 94743, 396, 1143, 7916, 452, 701, 540, 12133… $ water_area_km2 <dbl> 4597, 245384, 1026, 2961, 20501, 1170, 1816, 1399,… $ n_representatives <dbl> 7, 1, 9, 4, 52, 8, 5, 1, 28, 14, 2, 2, 17, 9, 4, 4… $ demonym <chr> "Alabamian", "Alaskan", "Arizonan", "Arkansan", "C…
The dataset has 50 observations (rows) and 14 variables (columns).
Each row represents one U.S. states.
We’re going to represent the ratios between land and water surfaces for each state.
Cleaning the data
We use the following code to calculate the land/water ratios:
# Land to water area ratios land_to_water_ratios <- states |> # calculate land/total and water/total area ratios mutate(land_area_ratio = round(land_area_km2 / total_area_km2, 3), water_area_ratio = round(1 - land_area_ratio, 3)) |> # select columns select(id = postal_abbreviation, state, land_area_ratio, water_area_ratio) # View first lines of cleaned data head(land_to_water_ratios)
# A tibble: 6 × 4 id state land_area_ratio water_area_ratio <chr> <chr> <dbl> <dbl> 1 AL Alabama 0.966 0.034 2 AK Alaska 0.858 0.142 3 AZ Arizona 0.997 0.003 4 AR Arkansas 0.979 0.021 5 CA California 0.952 0.048 6 CO Colorado 0.996 0.004
We calculate the coordinates for boundaries between land and water surfaces for each hexagon:
# Prepare data to create hex map of land to water ratio per US state us_clean <- us_hex |> # transform hex data into table fortify(region = "iso3166_2") |> # transform into tibble format as_tibble() |> # select columns select(id, long, lat) |> # join land to water ratios values left_join(land_to_water_ratios) |> # remove District of Columbia filter(id != "DC") |> # add column with point order mutate(pt_nb = rep(1:7, times = 50), .before = long) |> # group by state id group_by(id) |> # calculate parameters for each state mutate( # total area for reactangle around hex area_rect = (long[pt_nb == 3] - long[pt_nb == 5]) * (lat[pt_nb == 1] - lat[pt_nb == 4]), # slope of upper hex triangle slope = (lat[pt_nb == 6] - lat[pt_nb == 1]) / (long[pt_nb == 6] - long[pt_nb == 1]), # y coordinate for horizontal border btwn land/water split_y = ((land_area_ratio * area_rect) / (long[pt_nb == 2] - long[pt_nb == 6]) + lat[pt_nb == 4]), # determine type of split : 1 if split line below upper triangle / 2 if not split_type = case_when(split_y <= lat[pt_nb == 2] ~ 1, TRUE ~ 2), # calculate x coordinates for horizontal border btwn land/water split_x1 = case_when(split_type == 1 ~ min(long), split_type == 2 ~ long[pt_nb == 6] + ((split_y - lat[pt_nb == 6]) / slope)), split_x2 = case_when(split_type == 1 ~ max(long), split_type == 2 ~ long[pt_nb == 1] + (long[pt_nb == 1] - split_x1)) ) # Subset us_clean data for type 1 split us_clean_type1 <- us_clean |> # filter data filter(split_type == 1) |> # create new columns to keep points needed for plot mutate(pt1_x = long[pt_nb == 1], pt1_y = lat[pt_nb == 1], pt2_x = long[pt_nb == 2], pt2_y = lat[pt_nb == 2], pt3_x = unique(split_x2), pt3_y = unique(split_y), pt4_x = unique(split_x1), pt4_y = unique(split_y), pt5_x = long[pt_nb == 6], pt5_y = lat[pt_nb == 6], pt6_x = pt1_x, pt6_y = pt1_y) |> # select columns select(id, pt1_x:pt6_y) |> # ungroup data ungroup() |> # pivot to long format pivot_longer(cols = -id, names_to = "pt", values_to = "value") |> # separate "pt" column separate(col = pt, into = c("pt_nb", "xy"), sep = "_") |> # remove "pt" string from pt_nb column mutate(pt_nb = str_remove_all(pt_nb, "pt")) |> # keep distinct rows distinct() |> # pivot to wide format pivot_wider(id_cols = id:pt_nb, names_from = "xy", values_from = "value") # Subset us_clean data for type 2 split us_clean_type2 <- us_clean |> # filter data filter(split_type == 2) |> # create new columns to keep points needed for plot mutate(pt1_x = long[pt_nb == 1], pt1_y = lat[pt_nb == 1], pt2_x = unique(split_x2), pt2_y = unique(split_y), pt3_x = unique(split_x1), pt3_y = unique(split_y), pt4_x = pt1_x, pt4_y = pt1_y) |> # select columns select(id, pt1_x:pt4_y) |> # ungroup data ungroup() |> # pivot to long format pivot_longer(cols = -id, names_to = "pt", values_to = "value") |> # separate "pt" column separate(col = pt, into = c("pt_nb", "xy"), sep = "_") |> # remove "pt" string from pt_nb column mutate(pt_nb = str_remove_all(pt_nb, "pt")) |> # keep distinct rows distinct() |> # pivot to wide format pivot_wider(id_cols = id:pt_nb, names_from = "xy", values_from = "value")
We calculate the coordinates of each hexagon’s centre to add text labels:
# Calculate centres of polygons to plot state labels centres <- gCentroid(us_hex, byid = TRUE) |> as_tibble() labels <- us_hex@data$iso3166_2 hex_labels <- tibble(id = labels, centres) |> filter(id != "DC") # Clean global environment rm(centres, labels, land_to_water_ratios, states, us_hex)
Creating the plot
p <- ggplot() + geom_polygon(data = us_clean, aes(x = long, y = lat, group = id), colour = NA, fill = "#48bf91") + geom_polygon(data = us_clean_type1, aes(x = x, y = y, group = id), colour = NA, fill = "#0076be") + geom_polygon(data = us_clean_type2, aes(x = x, y = y, group = id), colour = NA, fill = "#0076be") + geom_polygon(data = us_clean, aes(x = long, y = lat, group = id), colour = "white", fill = NA, linewidth = 0.5) + geom_text(data = hex_labels, aes(x = x, y = y, label = id), family = "Roboto Condensed", colour = "black", size = 16) + coord_map() + labs(title = "<span style='color:#48bf91;'>Land</span> to <span style='color:#0076be;'>water</span> surface ratios in the U.S.", caption = "#TidyTuesday 2023 week 31 | Data from Wikipedia | Jonathan Kitt") + theme_void() + theme(panel.background = element_rect(fill = "black", colour = NA), plot.background = element_rect(fill = "black", colour = NA), plot.title = element_markdown(family = "Bangers", size = 90, hjust = 0.5, colour = "white", margin = margin(t = 20)), plot.caption = element_text(family = "Roboto Condensed", colour = "white", size = 30, hjust = 0.5, margin = margin(b = 5))) ggsave("figs/tt_2023_w31_us.png", p, dpi = 320, width = 12, height = 6)
And here’s the result!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.