Calculate Wages and Benefits with blscrapeR
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The most difficult thing about working with BLS data is gaining a clear understanding on what data are available and what they represent. Some of the more popular data sets can be found on the BLS Databases, Tables & Calculations website. The selected examples below do not include all series or databases.
Install blscrapeR
The first step in analyzing any of these data in R is to install the blscrapeR package from CRAN.
install.packages('blscrapeR')
Current Population Survey (CPS)
The CPS includes median weekly earnings by occupation, among other things.
For example, we can use blscrapeR to pull data from the API for the median weekly earnings for Database Administrators and Software Developers.
library(blscrapeR)
library(tidyr)
# Median Usual Weekly Earnings by Occupation, Unadjusted Second Quartile.
# In current dollars
df <- bls_api(c("LEU0254530800", "LEU0254530600"), startyear = 2000, endyear = 2009) %>%
spread(seriesID, value) %>%
dateCast()
# Plot
library(ggplot2)
ggplot(data = df, aes(x = date)) +
geom_line(aes(y = LEU0254530800, color = "Database Admins.")) +
geom_line(aes(y = LEU0254530600, color = "Software Devs.")) +
labs(title = "Median Weekly Earnings by Occupation") + ylab("value") +
theme(legend.position="top", plot.title = element_text(hjust = 0.5))
Occupational Employment Statistics (OES)
The OES contains similar wage data found in the CPS, but often has more resolution in certain geographic areas. Unlike the CPS, the OES is an annual survey and does not keep time series data.
For example, we may want to compare the average hourly wage of Computer and Information Systems Managers in Orlando, FL to those in San Jose, CA. Notice, below the survey only returns values for 2015.
# Computer and Information Systems Managers in Orlando, FL and San Jose, CA.
# Orlando: "OEUM003674000000011302103"
# San Jose: "OEUM004194000000011302108"
library(blscrapeR)
df <- bls_api(c("OEUM003674000000011302103", "OEUM004194000000011302108"))
head(df)
## year period periodName value footnotes seriesID
## 1 2016 A01 Annual 67.84 OEUM003674000000011302103
## 2 2016 A01 Annual 87.53 OEUM004194000000011302108
Another OES example would be to grab the most recent Annual mean wage for All Occupations in All Industries in the United States.
library(blscrapeR)
df <- bls_api("OEUN000000000000000000004")
head(df)
## year period periodName value footnotes seriesID
## 1 2016 A01 Annual 49630 OEUN000000000000000000004
Employer Cost for Employee Compensation
This data set includes time series data on how much employers pay for employee benefits as a total cost and as a percent of employee wages and salaries.
For example, if we want to see the total cost of benefits per hour work and also see what percentage that is of the total compensation, we could run the following script.
library(blscrapeR)
library(dplyr)
library(tidyr)
df <- bls_api(c("CMU1030000000000D", "CMU1030000000000P"))
# Spread series ids and rename columns to human readable format.
df.sp <- spread(df, seriesID, value) %>%
rename("hourly_cost"=CMU1030000000000D, "pct_of_wages"=CMU1030000000000P)
# Percentages are represented as floating integers. Fix this to avoid confusion.
df.sp$pct_of_wages <- df.sp$pct_of_wages*0.01
head(df.sp)
## year period periodName footnotes hourly_cost pct_of_wages
## 1 2014 Q04 4th Quarter 10.49 0.316
## 2 2014 Q03 3rd Quarter 10.07 0.313
## 3 2014 Q02 2nd Quarter 10.00 0.313
## 4 2014 Q01 1st Quarter 9.97 0.312
## 5 2015 Q04 4th Quarter 10.52 0.313
## 6 2015 Q03 3rd Quarter 10.48 0.314
National Compensation Survey-Benefits
This survey includes data on how many Americans have access to certain benefits. For example, we can see the percentage of those who have access to paid vacation days and those who have access to Health insurance through their employers.
library(blscrapeR)
library(dplyr)
library(tidyr)
df <- bls_api(c("NBU10500000000000033030", "NBU11500000000000028178"))
# Spread series ids and rename columns to human readable format.
df.sp <- spread(df, seriesID, value) %>%
rename("pct_paid_vacation"=NBU10500000000000033030, "pct_health_ins"=NBU11500000000000028178)
# Value data are in whole numbers but represent percentages. Fix this to avoid confusion.
df.sp$pct_paid_vacation <- df.sp$pct_paid_vacation*0.01
df.sp$pct_health_ins <- df.sp$pct_health_ins*0.01
head(df.sp)
## year period periodName footnotes pct_health_ins pct_paid_vacation
## 1 2014 A01 Annual 0.72 0.74
## 2 2015 A01 Annual 0.72 0.74
## 3 2016 A01 Annual 0.70 0.73
If you want more mapping options, there is more information in the blscrapeR
package vignettes.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.