Site icon R-bloggers

Popularity of Baby Names Since 1880

[This article was first published on Michael Toth's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A while back I spent some time figuring out how to serve interactive shiny apps through my website, but I haven’t had a chance to build anything until recently. I set out to create a few simple shiny apps in R that I could use as a sort of test run, and I’m writing those up here.

In this post I’m going to be analyzing some open data provided by the Social Security Administration on the popularity of baby names over the years–specifically, since 1880. Data comes from the Social Security Administration

In the application below, you can view the 10 most popular baby names for any year since 1880, either for males or females. You can also click the play icon directly below the year slider to view an animated history of the most common names.

In the next application, you can enter any name, and the graph will display how the popularity of that name has changed over time. Be sure to also select whether the name is for males or females, or you’ll likely see some unexpected results!

After building the shiny applications above, I got interested in whether I could identify any meaningful trends over time in the data. I wanted to see whether the concentration (the proportion of babies born with a given name) of the most popular names was relatively static over time, or whether this fluctuated. I was also interested in finding trends in the number of babies born with each of the most popular names. To investigate these, I used a subset of the original data, grabbing the 10 most common male and female names for each year since 1880. I went through several iterations of how best to display the data, and ultimately arrived at the graph below, which I quite like.

I was excited that this project gave me an opportunity to make use of David Robinson’s gganimate package, which I must regrettably admit I hadn’t had a chance to experiment with previously. For those unfamiliar, this package makes it incredibly easy to create animated ggplot graphs, and it’s awesome!

I wanted to create some kind of trailing visualization to make it clear how patterns and trends were changing over time. The implementation here was adapted from Thomas Pedersen’s example which he used to produce this image.

In the graph below we can use the trailing effect to easily identify trends that occur over a series of years. The grey background data also helps us to visualize how any given year compares with the overall history. I see 5 key periods present themselves in the data:

The code for this image is available below:

library(dplyr)
library(ggplot2)
library(gganimate) #devtools::install_github("dgrtwo/gganimate")
library(readr)
library(scales)

# Load pre-processed data. For additional details check Github below
top_10_each_year <- read_csv('input/top_10_each_year.csv')

# Create fading animation effect by replicating the data frame and adding an exponentially decaying fade parameter to previous years
anim <- lapply(1:10, function(i) {top_10_each_year$year <- top_10_each_year$year + i; top_10_each_year$fade <- 1 / (i + 2); top_10_each_year})
top_10_each_year$fade <- 1
top_10_with_fade <- rbind(top_10_each_year, do.call(rbind, anim))
top_10_with_fade <- filter(top_10_with_fade, year <= 2015)

p <- ggplot(top_10_with_fade, aes(x = proportion, y = count)) +
    geom_point(color = '#e6e6e6', size = 4) +
    geom_point(aes(color = sex, frame = year, alpha = fade), size = 4) +
    ggtitle('Top 10 Male & Female Baby Names\nYear:') +
    xlab('\nProportion (by sex) Born with Name') +
    ylab('Number Born with Name') +
    scale_color_manual(name = '', values = c('#ff7f00', '#377eb8'), labels = c('Female', 'Male')) +
    scale_x_continuous(labels = percent) + 
    scale_y_continuous(labels = comma) +
    scale_alpha(guide = 'none') + # Remove alpha legend from plot output
    theme_bw() +
    theme(panel.border = element_blank(),
          panel.grid = element_blank(),
          axis.ticks = element_blank(),
          legend.key = element_blank(),
          legend.position = 'bottom',
          axis.text = element_text(size = 14),
          axis.title = element_text(size = 16),
          legend.text = element_text(size = 12))

gg_animate(p, filename = 'yearly-birth-names-with-trails.gif', interval = 0.2, ani.width = 800, ani.height = 600)

For the full code behind the shiny applications and the animation produced above, check out my Github

To leave a comment for the author, please follow the link and comment on their blog: Michael Toth's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.