Examining Racial Income Disparities

[This article was first published on R on Sam Portnow's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Exploring Explanations Of the Black-White wealth gap

The black-white wealth gap is nearly 15:1; the mean of black household wealth is $138,200; for white houesholds it is $933,700.

A frequent explanation has been single-parent families in the Black community (1, 2 3) or culture (1, 2, 3, 4, 5)

Fortunately, we can test these claims. I am going to use the Add Health to investigate them.

The National Longitudinal Study of Adolescent to Adult Health (Add Health) is a longitudinal study of a nationally representative sample of adolescents in grades 7-12 in the United States during the 1994-95 school year.

get wave files

The first thing I’ll do is get the files from each wave.

dirs = dir_ls(here('content','data'), type = 'directory')

wave1_dirs = dirs[1:4]
wave2_dirs = dirs[5:7]
wave3_dirs = dirs[c(8, 16:18, 20:21)]
wave4_dirs = dirs[c(22, 21)]

wave1_files = dir_ls(wave1_dirs, glob = '*.rda')
wave2_files = dir_ls(wave2_dirs, glob = '*.rda')
wave3_files = dir_ls(wave3_dirs, glob = '*.rda')
wave4_files = dir_ls(wave4_dirs, glob = '*.rda')

function to read rda to list

rda2list <- function(file) {
  e <- new.env()
  load(file, envir = e)
  as.list(e)
}

read in the data

wave1_dfs <- map(wave1_files, rda2list)
wave1_dfs <- map(wave1_dfs, ~ as_tibble(.[[1]]))
wave1_df = wave1_dfs %>% reduce(full_join)
## Joining, by = "AID"
## Joining, by = "AID"
## Joining, by = "AID"
wave2_dfs <- map(wave2_files, rda2list)
wave2_dfs <- map(wave2_dfs, ~ as_tibble(.[[1]]))
wave2_df = wave2_dfs %>% reduce(full_join)
## Joining, by = "AID"
## Joining, by = "AID"
wave3_dfs <- map(wave3_files, rda2list)
wave3_dfs <- map(wave3_dfs, function(x){
  d = as_tibble(x[[1]])
  if ('RRELNO' %in% names(d)){
    d$RRELNO = as.factor(d$RRELNO)
  }
  if ('value' %in% names(d)){
    d$value = as.character(d$value)
  }
  d
})

wave3_df = wave3_dfs %>% reduce(full_join)
## Joining, by = "AID"
## Warning: Column `AID` joining factors with different levels, coercing to
## character vector
## Joining, by = "AID"
## Warning: Column `AID` joining character vector and factor, coercing into
## character vector
## Joining, by = "AID"
## Warning: Column `AID` joining character vector and factor, coercing into
## character vector
## Joining, by = "AID"
## Warning: Column `AID` joining character vector and factor, coercing into
## character vector
## Joining, by = "AID"
## Warning: Column `AID` joining character vector and factor, coercing into
## character vector
wave4_dfs <- map(wave4_files, rda2list)
wave4_dfs <- map(wave4_dfs, ~ as_tibble(.[[1]]))
wave4_df = wave4_dfs %>% reduce(full_join)
## Joining, by = "AID"
## Warning: Column `AID` joining factors with different levels, coercing to
## character vector

bind them all together

wave1_df$wave = 1
wave2_df$wave = 2
wave3_df$wave = 3
wave4_df$wave = 4

df = bind_rows(wave1_df, wave2_df, wave3_df, wave4_df)
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector, coercing
## into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector, coercing
## into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector, coercing
## into character vector

## Warning in bind_rows_(x, .id): binding character and factor vector, coercing
## into character vector

Variables to Look at

Race/Ethnicity

Which ONE category best describes your racial background?

H1GI6A - H1GI6E

Income

Now think about your personal earnings. In {2006/2007/2008}, how much income did you receive from personal earnings before taxes, that is, wages or salaries, including tips, bonuses, and overtime pay, and income from self-employment?

H4EC2 (W4)

I’m going to look at variables that best approximate the reasons for racial disparities mentioned above. I’ll focus on Impulsivity, Conscientiousness, Intelligence, and Parental Marital Status. I chose Parental Marital Status because it’s so frequently offered as the reason for racial disparities. I choses impulsivity, conscientiousness, and intelligence because I think they approximate the cultural explanations – that if one works hard and is smart, then there won’t be any racial disparities.

Impulsivity:

When making decisions, you usually go with your “gut feeling” without thinking too much about the consequences of each alternative.

  • H1PF16 (W1)
  • H2PF15 (W2)
  • H3SP18 (W3)
  • H4PE34 (W4)

Conscientiousnes:

These variables are taken (from the following article)[https://www.frontiersin.org/articles/10.3389/fpsyg.2011.00158/full]

  • H1PF18 (Pay attention to details)
  • H1PF19 (Come up with good solutions)
  • H1PF20 (Do things according to a plan)
  • H1PF21 (Do more than what’s expected of me)

Intelligence

Peaboy Picture Vocab Standardized

AH_PVT (W1)

How many total words on the Word List did the respondent remember during the 90-second recall period?

C4WD90_1 (W4)

Parental Background

What is your current marital status?

PA10

Data Manipulation

Now, I’ll move onto cleaning up the dataset.

Construct Race

df = df %>%
  mutate(
    race = case_when(
      H1GI6B == '(1) (1) Marked' ~ 'black', 
      H1GI6C == '(1) (1) Marked' ~ 'native american', 
      H1GI6D == '(1) (1) Marked' ~ 'asian or pacific islander', 
      H1GI6E == '(1) (1) Marked' ~ 'other',
      H1GI6A == '(1) (1) Marked' ~ 'white', 
    )
  )

# fill in race for missing waves
df = df %>% group_by(AID) %>%
  mutate(
    race = if_else(any(! is.na(race)), paste0(unique(race), collapse = '|'), NA_character_)
  )

# for some reason the na detection wasnt working correctly above, so im just str_replacing the NAs
df$race = str_replace(df$race, '\\|.*', '')

construct conscientiousness

df = df %>%
 mutate_at(vars(H1PF18:H1PF21), as.numeric)

df$conscientiousness = 5 - rowMeans(df[,c('H1PF18', 'H1PF19', 'H1PF20', 'H1PF21')], na.rm = T)

Below, I make a function to plot the data.

For each plot, I am going to plot the distribution of White and Black people’s earnings, and I am going to break those distribution out by categories (e.g., parents married/not married, or the 0, 10%, 30%, 50%, 70%, 90%, and 100% quantiles of a continuous variable )

plot_it <- function(value){
    
    probs <- c(0, .1, .3, .5, .7, .9,  1)
    df.plot$quartile <- with(df.plot, cut(value, breaks = unique(quantile(value, probs, na.rm=TRUE)), 
                                include.lowest=TRUE))
    ### getting rid of outliers
    df.plot = df.plot %>% filter(H4EC2 < 100000)
    df.plot$H4EC2 = df.plot$H4EC2/10000
    ggplot(df.plot, aes(x = H4EC2, y = race, fill = race)) + 
      stat_density_ridges(quantile_lines = TRUE, quantiles = 2) +
      scale_x_continuous(breaks = seq(0, 10, by = .5)) + 
      facet_wrap(~ quartile, scales = 'free', ncol = 1)
  
}

get wave 1 variables joined in with wave 4 variables

df = df %>% 
  mutate(
    married = if_else(PA10 == '(2) (2) Married (skip to A12)', 'married', 'not married')
  )

df.wave1 = df %>% 
  filter(wave == 1 &race %in% c('black', 'white') & ! is.na(married)) %>% 
  select(AID, race, married)
df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2) 
df.plot = left_join(df.wave1, df.outcome)
## Joining, by = "AID"

Data Analysis

look at earnings in Wave 4 (H4EC2)

For each graph, income is measured in 10,000’s. so 2.5 = $25,000

by marital status

df.plot = df.plot %>% filter(H4EC2 < 100000)
df.plot$H4EC2 = df.plot$H4EC2/10000

ggplot(df.plot, aes(x = H4EC2, y = race, fill = race)) + 
  stat_density_ridges(quantile_lines = TRUE, quantiles = 2) +
  scale_x_continuous(breaks = seq(0, 10, by = .5)) + 
  facet_wrap(~ married, scales = 'free', ncol = 1)
## Picking joint bandwidth of 0.46
## Picking joint bandwidth of 0.509

by conscientiousness

df.wave1 = df %>% 
  filter(wave == 1 &race %in% c('black', 'white') & ! is.na(conscientiousness)) %>% 
  select(AID, race, conscientiousness)
df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2) 
df.plot = left_join(df.wave1, df.outcome)
## Joining, by = "AID"
plot_it(df.plot$conscientiousness)
## Picking joint bandwidth of 0.606
## Picking joint bandwidth of 0.566
## Picking joint bandwidth of 0.475
## Picking joint bandwidth of 0.557
## Picking joint bandwidth of 0.752

by peabody picture vocabulary

df.wave1 = df %>% 
  filter(wave == 1 &race %in% c('black', 'white') & ! is.na(AH_PVT)) %>% 
  select(AID, race, AH_PVT)
df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2) 
df.plot = left_join(df.wave1, df.outcome)
## Joining, by = "AID"
plot_it(df.plot$AH_PVT)
## Picking joint bandwidth of 0.594
## Picking joint bandwidth of 0.498
## Picking joint bandwidth of 0.518
## Picking joint bandwidth of 0.538
## Picking joint bandwidth of 0.544
## Picking joint bandwidth of 0.684

word recall

df.wave4 <- df %>% 
  filter(wave == 4 &race %in% c('black', 'white') & ! is.na(C4WD90_1)) %>% 
  select(AID, race, C4WD90_1)
df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2) 
df.plot = left_join(df.wave4, df.outcome)
## Joining, by = "AID"
plot_it(df.plot$C4WD90_1)
## Picking joint bandwidth of 0.545
## Picking joint bandwidth of 0.481
## Picking joint bandwidth of 0.534
## Picking joint bandwidth of 0.614
## Picking joint bandwidth of 0.686
## Picking joint bandwidth of 0.708

by impulsivity

df.wave1 = df %>% 
  filter(wave == 1 &race %in% c('black', 'white') & ! is.na(H1PF16)) %>% 
  select(AID, race, H1PF16)
df.outcome = df %>% filter(wave == 4) %>% select(AID, H4EC2) 
df.plot = left_join(df.wave1, df.outcome)
## Joining, by = "AID"
df.plot$H1PF16 = as.numeric(df.plot$H1PF16)
plot_it(df.plot$H1PF16)
## Picking joint bandwidth of 0.475
## Picking joint bandwidth of 0.531
## Picking joint bandwidth of 0.509
## Picking joint bandwidth of 0.745

In summary, it’s clear that none of the variables come close to explaining the racial disparities in income. While it is true that some variables (particularly intelligence), reduce disparities at the higher ends of the distributions, White participants in ADD Health at the middle and lower ends of the disribution outpaced their Black counterparts by approximately $5,000 - $10,000. Another important point to add is that I’m only looking at income; to eliminate racial wealth gaps, Black participants in the study would have to earn more than their White counterparts to have any hope of eliminating those disparities.

To leave a comment for the author, please follow the link and comment on their blog: R on Sam Portnow's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)