Site icon R-bloggers

Street names

[This article was first published on r.iresmi.net, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Lyon – CC-BY-NC-ND by Emmanuel Fromm

Day 2 of 30DayMapChallenge: « Lines » (previously).

We’ll make a map of the street name gender in Lyon. We need a database of french first names where we’ll find the gender. We will extract the Lyon streets from OpenStreetMap.

library(arrow)
library(dplyr)
library(tidyr)
library(readr)
library(purrr)
library(ggplot2)
library(stringr)
library(sf)
library(osmdata)
library(ggspatial)
library(glue)
library(knitr)

set.seed(42)
< section id="first-names" class="level2">

First names

if (!file.exists("freq_prenoms.rds")) {
  freq_prenoms <- read_parquet("https://www.insee.fr/fr/statistiques/fichier/8205621/prenoms-2023-nat.parquet") |> 
    filter(preusuel != "_PRENOMS_RARES") |> 
    mutate(preusuel = iconv(preusuel, to = "ASCII//TRANSLIT")) |> 
    group_by(preusuel, sexe) |> 
    summarise(n = sum(nombre, na.rm = TRUE),
              .groups = "drop_last") |>
    mutate(total = sum(n)) |> 
    ungroup() |> 
    mutate(sexe = case_when(sexe == 1 ~ "M",
                            sexe == 2 ~ "F",
                            .default = NA_character_)) |> 
    pivot_wider(names_from = sexe, 
                values_from = n,
                values_fill = 0) |> 
    mutate(across(c(M, F), \(x) x / total)) |> 
    write_rds("freq_prenoms.rds")
} else {
  freq_prenoms <- read_rds("freq_prenoms.rds")
}

We have 34234 first names and their gender frequencies since 1900.

Sample of first names
preusuel total M F
ZENABOU 48 0 1
EMILIENE 25 0 1
KINGSLEY 878 1 0
DOLOVAN 73 1 0
ERCOLE 67 1 0
YVA 178 0 1
ISSEY 79 1 0
SAWSSEN 121 0 1
MISBAH 24 0 1
GOHANN 20 1 0
< section id="map-data" class="level2">

Map data

lyon_bbox <- getbb("Lyon, France", featuretype = "city")

if (!file.exists("osm.rds")) {
  lyon <- opq(lyon_bbox) |>
    add_osm_features(features = c(
      '"highway"="motorway"',
      '"highway"="trunk"',
      '"highway"="primary"',
      '"highway"="secondary"',
      '"highway"="tertiary"',
      '"highway"="motorway_link"',
      '"highway"="trunk_link"',
      '"highway"="primary_link"',
      '"highway"="secondary_link"',
      '"highway"="tertiary_link"',
      '"highway"="motorway_junction"',
      '"highway"="unclassified"',
      '"highway"="service"',
      '"highway"="pedestrian"',
      '"highway"="living_street"',
      '"highway"="residential"')) |> 
    osmdata_sf() |> 
    pluck("osm_lines") |> 
    select(osm_id, name) |> 
    drop_na(name) |> 
    group_by(name) |> 
    summarise() |> 
    write_rds("osm.rds")
} else {
  lyon <- read_rds("osm.rds")
}
< section id="finding-first-names-in-street-names" class="level2">

Finding first names in street names

We use a brute-force method: for each street we check if a part of it’s label is present in our list of female or male first names. We keep only first names with a high frequency in any of the genders.

female <- freq_prenoms |> 
  filter(F > .8,
         str_length(preusuel) > 1,
         preusuel != "LA") |> 
  pull(preusuel)

male <- freq_prenoms |> 
  filter(M > .8, 
         str_length(preusuel) > 1) |> 
  pull(preusuel)

street_gender <- lyon |> 
  mutate(name = str_to_upper(iconv(name, to = "ASCII//TRANSLIT")),
         m = str_extract_all(name, glue_collapse(male, sep = "\\b|\\b", last = "\\b")),
         f = str_extract_all(name, glue_collapse(female, sep = "\\b|\\b", last = "\\b")),
         gender = unlist(map2(f, m, ~ case_when(length(.y) > length(.x) ~ "male",
                                             length(.x) > length(.y) ~ "female",
                                             identical(.x, character(0)) & 
                                               identical(.y, character(0)) ~ "not concerned",
                                             length(.x) == length(.y) ~ "undecidable",
                                             .default = NA_character_))))
Sample of classification
name geometry m f gender
COURS DE VERDUN RECAMIER LINESTRING (4.830426 45.748… not concerned
IMPASSE DES ANGLAIS LINESTRING (4.795807 45.753… not concerned
RUE DES PROVENCES LINESTRING (4.79335 45.7369… not concerned
CHEMIN DES PEUPLIERS LINESTRING (4.866587 45.801… not concerned
ALLEE DU LEVANT LINESTRING (4.878859 45.759… not concerned
RUE ROPOSTE LINESTRING (4.866353 45.760… not concerned
ALLEE NELLIE BLY LINESTRING (4.84882 45.7429… NELLIE female
QUAI JEAN MOULIN MULTILINESTRING ((4.837853 … JEAN male
LA VIEILLE ROUTE LINESTRING (4.769782 45.720… not concerned
AVENUE DE CHAMPAGNE MULTILINESTRING ((4.796801 … not concerned
< section id="map" class="level2">

Map

street_gender |> 
  mutate(gender = factor(gender, levels = c("female", "male", "undecidable", "not concerned"))) |> 
  st_set_crs("EPSG:4326") |> 
  ggplot() +
  geom_sf(aes(color = gender), 
          linewidth = .5,
          key_glyph = "timeseries") +
  scale_color_manual(values = c("female" = "lightpink1",
                                "male" = "lightskyblue",
                                "undecidable" = "lightyellow4",
                                "not concerned" = "seashell2")) +
  annotation_scale(bar_cols =  c("darkgrey", "white"),
                   line_col = "darkgrey",
                   text_col = "darkgrey",
                   height = unit(0.1, "cm")) +
  coord_sf(xlim = lyon_bbox[c(1, 3)],
           ylim = lyon_bbox[c(2, 4)]) +
  labs(title = "Gender in Lyon street names",
       color = "",
       caption = glue("Map data © OpenStreetMap contributors
                      using INSEE Fichier des prénoms 2023
                      r.iresmi.net - {Sys.Date()}")) +
  theme_void() +
  theme(plot.background = element_rect(color = NA, 
                                       fill = "white"),
        plot.caption = element_text(size = 5,
                                    color = "darkgrey"))

Lyon
< section id="possible-miss-classifications" class="level2">

Possible miss-classifications

Lots of bias make this map unreliable, and would need manual editing…

< section id="epicenous-first-names" class="level3">

epicenous first names

< section id="not-concerned" class="level3">

not concerned

< section id="has-a-gender-but-shouldnt" class="level3">

has a gender but shouldn’t

< section id="accidentally-well-classified" class="level3">

accidentally well classified

< !-- -->
To leave a comment for the author, please follow the link and comment on their blog: r.iresmi.net.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version