Examining Racial Income Disparities
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exploring Explanations Of the Black-White wealth gap
The black-white wealth gap is nearly 15:1; the mean of black household wealth is $138,200; for white houesholds it is $933,700.
A frequent explanation has been single-parent families in the Black community (1, 2 3) or culture (1, 2, 3, 4, 5)
Fortunately, we can test these claims. I am going to use the Add Health to investigate them.
The National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States during the 1994-95 school year.
get wave files
The first thing I’ll do is get the files from each wave.
dirs = dir_ls(here('content','data'), type = 'directory') wave1_dirs = dirs[1:4] wave2_dirs = dirs[5:7] wave3_dirs = dirs[c(8, 16:18, 20:21)] wave4_dirs = dirs[c(22, 21)] wave1_files = dir_ls(wave1_dirs, glob = '*.rda') wave2_files = dir_ls(wave2_dirs, glob = '*.rda') wave3_files = dir_ls(wave3_dirs, glob = '*.rda') wave4_files = dir_ls(wave4_dirs, glob = '*.rda')
function to read rda to list
rda2list <- function(file) { e <- new.env() load(file, envir = e) as.list(e) }
read in the data
wave1_dfs <- map(wave1_files, rda2list) wave1_dfs <- map(wave1_dfs, ~ as_tibble(.[[1]])) wave1_df = wave1_dfs %>% reduce(full_join) ## Joining, by = "AID" ## Joining, by = "AID" ## Joining, by = "AID" wave2_dfs <- map(wave2_files, rda2list) wave2_dfs <- map(wave2_dfs, ~ as_tibble(.[[1]])) wave2_df = wave2_dfs %>% reduce(full_join) ## Joining, by = "AID" ## Joining, by = "AID" wave3_dfs <- map(wave3_files, rda2list) wave3_dfs <- map(wave3_dfs, function(x){ d = as_tibble(x[[1]]) if ('RRELNO' %in% names(d)){ d$RRELNO = as.factor(d$RRELNO) } if ('value' %in% names(d)){ d$value = as.character(d$value) } d }) wave3_df = wave3_dfs %>% reduce(full_join) ## Joining, by = "AID" ## Warning: Column `AID` joining factors with different levels, coercing to ## character vector ## Joining, by = "AID" ## Warning: Column `AID` joining character vector and factor, coercing into ## character vector ## Joining, by = "AID" ## Warning: Column `AID` joining character vector and factor, coercing into ## character vector ## Joining, by = "AID" ## Warning: Column `AID` joining character vector and factor, coercing into ## character vector ## Joining, by = "AID" ## Warning: Column `AID` joining character vector and factor, coercing into ## character vector wave4_dfs <- map(wave4_files, rda2list) wave4_dfs <- map(wave4_dfs, ~ as_tibble(.[[1]])) wave4_df = wave4_dfs %>% reduce(full_join) ## Joining, by = "AID" ## Warning: Column `AID` joining factors with different levels, coercing to ## character vector
bind them all together
wave1_df$wave = 1 wave2_df$wave = 2 wave3_df$wave = 3 wave4_df$wave = 4 df = bind_rows(wave1_df, wave2_df, wave3_df, wave4_df) ## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character ## Warning in bind_rows_(x, .id): binding character and factor vector, coercing ## into character vector ## Warning in bind_rows_(x, .id): binding character and factor vector, coercing ## into character vector ## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character ## Warning in bind_rows_(x, .id): binding character and factor vector, coercing ## into character vector ## Warning in bind_rows_(x, .id): binding character and factor vector, coercing ## into character vector
Variables to Look at
Race/Ethnicity
Which ONE category best describes your racial background?
H1GI6A - H1GI6E
Income
Now think about your personal earnings. In {2006/2007/2008}, how much income did you receive from personal earnings before taxes, that is, wages or salaries, including tips, bonuses, and overtime pay, and income from self-employment?
H4EC2 (W4)
I’m going to look at variables that best approximate the reasons for racial disparities mentioned above. I’ll focus on Impulsivity, Conscientiousness, Intelligence, and Parental Marital Status. I chose Parental Marital Status because it’s so frequently offered as the reason for racial disparities. I choses impulsivity, conscientiousness, and intelligence because I think they approximate the cultural explanations – that if one works hard and is smart, then there won’t be any racial disparities.
Impulsivity:
When making decisions, you usually go with your “gut feeling” without thinking too much about the consequences of each alternative.
- H1PF16 (W1)
- H2PF15 (W2)
- H3SP18 (W3)
- H4PE34 (W4)
Conscientiousnes:
These variables are taken (from the following article)[https://www.frontiersin.org/articles/10.3389/fpsyg.2011.00158/full]
- H1PF18 (Pay attention to details)
- H1PF19 (Come up with good solutions)
- H1PF20 (Do things according to a plan)
- H1PF21 (Do more than what’s expected of me)
Intelligence
Peaboy Picture Vocab Standardized
AH_PVT (W1)
How many total words on the Word List did the respondent remember during the 90-second recall period?
C4WD90_1 (W4)
Parental Background
What is your current marital status?
PA10
Data Manipulation
Now, I’ll move onto cleaning up the dataset.
Construct Race
df = df %>% mutate( race = case_when( H1GI6B == '(1) (1) Marked' ~ 'black', H1GI6C == '(1) (1) Marked' ~ 'native american', H1GI6D == '(1) (1) Marked' ~ 'asian or pacific islander', H1GI6E == '(1) (1) Marked' ~ 'other', H1GI6A == '(1) (1) Marked' ~ 'white', ) ) # fill in race for missing waves df = df %>% group_by(AID) %>% mutate( race = if_else(any(! is.na(race)), paste0(unique(race), collapse = '|'), NA_character_) ) # for some reason the na detection wasnt working correctly above, so im just str_replacing the NAs df$race = str_replace(df$race, '\\|.*', '')
construct conscientiousness
df = df %>% mutate_at(vars(H1PF18:H1PF21), as.numeric) df$conscientiousness = 5 - rowMeans(df[,c('H1PF18', 'H1PF19', 'H1PF20', 'H1PF21')], na.rm = T)
Below, I make a function to plot the data.
For each plot, I am going to plot the distribution of White and Black people’s earnings, and I am going to break those distribution out by categories (e.g., parents married/not married, or the 0, 10%, 30%, 50%, 70%, 90%, and 100% quantiles of a continuous variable )
plot_it <- function(value){ probs <- c(0, .1, .3, .5, .7, .9, 1) df.plot$quartile <- with(df.plot, cut(value, breaks = unique(quantile(value, probs, na.rm=TRUE)), include.lowest=TRUE)) ### getting rid of outliers df.plot = df.plot %>% filter(H4EC2 < 100000) df.plot$H4EC2 = df.plot$H4EC2/10000 ggplot(df.plot, aes(x = H4EC2, y = race, fill = race)) + stat_density_ridges(quantile_lines = TRUE, quantiles = 2) + scale_x_continuous(breaks = seq(0, 10, by = .5)) + facet_wrap(~ quartile, scales = 'free', ncol = 1) }
get wave 1 variables joined in with wave 4 variables
df = df %>% mutate( married = if_else(PA10 == '(2) (2) Married (skip to A12)', 'married', 'not married') ) df.wave1 = df %>% filter(wave == 1 &race %in% c('black', 'white') & ! is.na(married)) %>% select(AID, race, married) df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2) df.plot = left_join(df.wave1, df.outcome) ## Joining, by = "AID"
Data Analysis
look at earnings in Wave 4 (H4EC2)
For each graph, income is measured in 10,000’s. so 2.5 = $25,000
by marital status
df.plot = df.plot %>% filter(H4EC2 < 100000) df.plot$H4EC2 = df.plot$H4EC2/10000 ggplot(df.plot, aes(x = H4EC2, y = race, fill = race)) + stat_density_ridges(quantile_lines = TRUE, quantiles = 2) + scale_x_continuous(breaks = seq(0, 10, by = .5)) + facet_wrap(~ married, scales = 'free', ncol = 1) ## Picking joint bandwidth of 0.46 ## Picking joint bandwidth of 0.509
by conscientiousness
df.wave1 = df %>% filter(wave == 1 &race %in% c('black', 'white') & ! is.na(conscientiousness)) %>% select(AID, race, conscientiousness) df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2) df.plot = left_join(df.wave1, df.outcome) ## Joining, by = "AID" plot_it(df.plot$conscientiousness) ## Picking joint bandwidth of 0.606 ## Picking joint bandwidth of 0.566 ## Picking joint bandwidth of 0.475 ## Picking joint bandwidth of 0.557 ## Picking joint bandwidth of 0.752
by peabody picture vocabulary
df.wave1 = df %>% filter(wave == 1 &race %in% c('black', 'white') & ! is.na(AH_PVT)) %>% select(AID, race, AH_PVT) df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2) df.plot = left_join(df.wave1, df.outcome) ## Joining, by = "AID" plot_it(df.plot$AH_PVT) ## Picking joint bandwidth of 0.594 ## Picking joint bandwidth of 0.498 ## Picking joint bandwidth of 0.518 ## Picking joint bandwidth of 0.538 ## Picking joint bandwidth of 0.544 ## Picking joint bandwidth of 0.684
word recall
df.wave4 <- df %>% filter(wave == 4 &race %in% c('black', 'white') & ! is.na(C4WD90_1)) %>% select(AID, race, C4WD90_1) df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2) df.plot = left_join(df.wave4, df.outcome) ## Joining, by = "AID" plot_it(df.plot$C4WD90_1) ## Picking joint bandwidth of 0.545 ## Picking joint bandwidth of 0.481 ## Picking joint bandwidth of 0.534 ## Picking joint bandwidth of 0.614 ## Picking joint bandwidth of 0.686 ## Picking joint bandwidth of 0.708
by impulsivity
df.wave1 = df %>% filter(wave == 1 &race %in% c('black', 'white') & ! is.na(H1PF16)) %>% select(AID, race, H1PF16) df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2) df.plot = left_join(df.wave1, df.outcome) ## Joining, by = "AID" df.plot$H1PF16 = as.numeric(df.plot$H1PF16) plot_it(df.plot$H1PF16) ## Picking joint bandwidth of 0.475 ## Picking joint bandwidth of 0.531 ## Picking joint bandwidth of 0.509 ## Picking joint bandwidth of 0.745
In summary, it’s clear that none of the variables come close to explaining the racial disparities in income. While it is true that some variables (particularly intelligence), reduce disparities at the higher ends of the distributions, White participants in ADD Health at the middle and lower ends of the disribution outpaced their Black counterparts by approximately $5,000 - $10,000. Another important point to add is that I’m only looking at income; to eliminate racial wealth gaps, Black participants in the study would have to earn more than their White counterparts to have any hope of eliminating those disparities.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.