Putting It All Together
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The kind folks over at @RStudio gave a nod to my recently CRAN-released epidata
package in their January data package roundup and I thought it might be useful to give it one more showcase using the recent CRAN update to ggalt
and the new hrbrthemes
(github-only for now) packages.
Labor force participation rate
The U.S. labor force participation rate (LFPR) is an oft-overlooked and under- or mis-reported economic indicator. I’ll borrow the definition from Investopedia:
Population age distributions and other factors are necessary to honestly interpret this statistic. Parties in power usually dismiss/ignore this statistic outright and their opponents tend to wholly embrace it for criticism (it’s an easy target if you’re naive). “Yay” partisan democracy.
Since the LFPR is has nuances when looked at categorically, let’s take a look at it by attained education level to see how that particular view has changed over time (at least since the gov-quants have been tracking it).
We can easily grab this data with epidata::get_labor_force_participation_rate()
(and, we’ll setup some library()
calls while we’re at it:
library(epidata) library(hrbrthemes) # devtools::install_github("hrbrmstr/hrbrthemes") library(ggalt) library(tidyverse) library(stringi) part_rate <- get_labor_force_participation_rate("e") glimpse(part_rate) ## Observations: 457 ## Variables: 7 ## $ date <date> 1978-12-01, 1979-01-01, 1979-02-01, 1979-03-01, 1979-04-01, 1979-05-01... ## $ all <dbl> 0.634, 0.634, 0.635, 0.636, 0.636, 0.637, 0.637, 0.637, 0.638, 0.638, 0... ## $ less_than_hs <dbl> 0.474, 0.475, 0.475, 0.475, 0.475, 0.474, 0.474, 0.473, 0.473, 0.473, 0... ## $ high_school <dbl> 0.690, 0.691, 0.692, 0.692, 0.693, 0.693, 0.694, 0.694, 0.695, 0.696, 0... ## $ some_college <dbl> 0.709, 0.710, 0.711, 0.712, 0.712, 0.713, 0.712, 0.712, 0.712, 0.712, 0... ## $ bachelor's_degree <dbl> 0.771, 0.772, 0.772, 0.773, 0.772, 0.772, 0.772, 0.772, 0.772, 0.773, 0... ## $ advanced_degree <dbl> 0.847, 0.847, 0.848, 0.848, 0.848, 0.848, 0.847, 0.847, 0.848, 0.848, 0...
One of the easiest things to do is to use ggplot2
to make a faceted line chart by attained education level. But, let’s change the labels so they are a bit easier on the eyes in the facets and switch the facet order from alphabetical to something more useful:
gather(part_rate, category, rate, -date) %>% mutate(category=stri_replace_all_fixed(category, "_", " "), category=stri_trans_totitle(category), category=stri_replace_last_regex(category, "Hs$", "High School"), category=factor(category, levels=c("Advanced Degree", "Bachelor's Degree", "Some College", "High School", "Less Than High School", "All"))) -> part_rate
Now, we’ll make a simple line chart, tweaking the aesthetics just a bit:
ggplot(part_rate) + geom_line(aes(date, rate, group=category)) + scale_y_percent(limits=c(0.3, 0.9)) + facet_wrap(~category, scales="free") + labs(x=paste(format(range(part_rate$date), "%Y-%b"), collapse=" to "), y="Participation rate (%)", title="U.S. Labor Force Participation Rate", caption="Source: EPI analysis of basic monthly Current Population Survey microdata.") + theme_ipsum_rc(grid="XY", axis="XY")
The “All” view is interesting in that the LFPR has held fairly “steady” between 60% & 70%. Those individual and fractional percentage points actually translate to real humans, so the “minor” fluctuations do matter.
It’s also interesting to see the direct contrast between the starting historical rate and current rate (you could also do the same with min/max rates, etc.) We can use a “dumbbell” chart to compare the 1978 value to today’s value, but we’ll need to reshape the data a bit first:
group_by(part_rate, category) %>% arrange(date) %>% slice(c(1, n())) %>% spread(date, rate) %>% ungroup() %>% filter(category != "All") %>% mutate(category=factor(category, levels=rev(levels(category)))) -> rate_range filter(part_rate, category=="Advanced Degree") %>% arrange(date) %>% slice(c(1, n())) %>% mutate(lab=lubridate::year(date)) -> lab_df
(We’ll be using the extra data frame to add labels the chart.)
Now, we can compare the various ranges, once again tweaking aesthetics a bit:
ggplot(rate_range) + geom_dumbbell(aes(y=category, x=`1978-12-01`, xend=`2016-12-01`), size=3, color="#e3e2e1", colour_x = "#5b8124", colour_xend = "#bad744", dot_guide=TRUE, dot_guide_size=0.25) + geom_text(data=lab_df, aes(x=rate, y=5.25, label=lab), vjust=0) + scale_x_percent(limits=c(0.375, 0.9)) + labs(x=NULL, y=NULL, title=sprintf("U.S. Labor Force Participation Rate %s-Present", lab_df$lab[1]), caption="Source: EPI analysis of basic monthly Current Population Survey microdata.") + theme_ipsum_rc(grid="X")
Fin
One takeaway from both these charts is that it’s probably important to take education level into account when talking about the labor force participation rate. The get_labor_force_participation_rate()
function — along with most other functions in the epidata
package — also has options to factor the data by sex, race and age, so you can pull in all those views to get a more nuanced & informed understanding of this economic health indicator.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.