Working with Ordinal Ranks in {marginaleffects}
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Given an ordinal regression model, it is relatively easy to get class-wise predictions – the conditional predicted probability of each level of the outcome. However, often, one might be interested in summarizing effects not on the class-wise probability scale (nor on the latent scale), but instead on the rank scale – the expected ordinal level, expressed as a single number.
The {emmeans} package has the mode = "mean.class" that does just this.
Let’s see how we can do this in {marginaleffects} by utilizing the powerful hypothesis= argument.
Fit a model
library(tidyverse)
library(marginaleffects)
bfi <- psych::bfi |>
mutate(
gender = factor(gender),
A1 = ordered(A1)
) |>
drop_na(A1, age, gender)
nrow(bfi)
#> [1] 2784
fit <- MASS::polr(A1 ~ age + gender,
data = bfi,
Hess = TRUE)
Class-Wise Predictions
pr1 <- predictions(fit, variables = "gender") pr1 #> #> Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % #> 1 0.1800 0.01136 15.85 <0.001 185.5 0.15777 0.2023 #> 1 0.1886 0.01128 16.71 <0.001 205.9 0.16649 0.2107 #> 1 0.1843 0.01132 16.28 <0.001 195.6 0.16210 0.2065 #> 1 0.1843 0.01132 16.28 <0.001 195.6 0.16210 0.2065 #> 1 0.1843 0.01132 16.28 <0.001 195.6 0.16210 0.2065 #> --- 33398 rows omitted. See ?print.marginaleffects --- #> 6 0.0289 0.00335 8.64 <0.001 57.3 0.02234 0.0355 #> 6 0.0231 0.00264 8.78 <0.001 59.1 0.01798 0.0283 #> 6 0.0219 0.00250 8.76 <0.001 58.8 0.01699 0.0268 #> 6 0.0207 0.00238 8.71 <0.001 58.1 0.01604 0.0254 #> 6 0.0121 0.00165 7.35 <0.001 42.2 0.00891 0.0154 #> Type: probs nrow(pr1) # original data * 2 (counterfactual gender) * 6 (levels of A1) #> [1] 33408
For each observation in the original data frame we now have 12 predictions:
- 2 counterfactual predictions for the two levels of gender, times
- 6 (arguably also counterfactual) predictions for each of the possible outcome levels.
Sum-scores (mean rank)
We can average the class ranks by their predicted probability to obtain “sum scores”:
And we can do this for each row in the counterfactual data frame (marked by the rowidcf variable):
sum_score <- function(x) {
x |>
arrange(rowidcf, group, gender) |>
# for each counterfactual row and level of gender
summarise(
term = "sum score",
estimate = sum(estimate * (1:6)),
.by = c(rowidcf, gender)
)
}
pr2 <- predictions(fit, variables = "gender",
hypothesis = sum_score)
pr2
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> 3.02 0.0583 51.8 <0.001 Inf 2.90 3.13
#> 2.51 0.0451 55.6 <0.001 Inf 2.42 2.60
#> 2.97 0.0553 53.7 <0.001 Inf 2.86 3.08
#> 2.46 0.0415 59.4 <0.001 Inf 2.38 2.55
#> 3.00 0.0568 52.8 <0.001 Inf 2.88 3.11
#> --- 5558 rows omitted. See ?print.marginaleffects ---
#> 2.24 0.0305 73.4 <0.001 Inf 2.18 2.30
#> 2.67 0.0473 56.4 <0.001 Inf 2.57 2.76
#> 2.20 0.0304 72.2 <0.001 Inf 2.14 2.26
#> 2.26 0.0645 35.0 <0.001 890.8 2.13 2.38
#> 1.85 0.0457 40.6 <0.001 Inf 1.77 1.94
#> Term: sum score
#> Type: probs
nrow(pr2) # original data * 2 (counterfactual gender)
#> [1] 5568
As expected, we now have 2 predictions for each observation in the original data frame: - 2 counterfactual predictions for the two levels of gender
Note that we could have also computed the mean rank for the two levels of gender, or literally anything else. But here I am trying to mimic the basic behavior of the avg_/comparisons() functions by keeping with the workflow:
- Compute observation wise counterfactual predictions
- Contrast them for each observation
- (Possibly) average them
So let’s get those contrasts!
Contrast the ranks
sum_score.contr <- function(x) {
# same as before, and...
sum_score(x) |>
# compute a single contrast for each counterfactual row
summarise(
term = "gender (2-1)",
estimate = estimate[gender == "2"] - estimate[gender == "1"],
.by = rowidcf
)
}
This function will compute a difference between the rank for each gender for each observation.
pr3 <- predictions(fit, variables = "gender",
hypothesis = sum_score.contr)
pr3
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> -0.511 0.0588 -8.68 <0.001 57.8 -0.626 -0.395
#> -0.506 0.0584 -8.67 <0.001 57.7 -0.621 -0.392
#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394
#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394
#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394
#> --- 2774 rows omitted. See ?print.marginaleffects ---
#> -0.504 0.0582 -8.67 <0.001 57.6 -0.618 -0.390
#> -0.483 0.0560 -8.62 <0.001 57.1 -0.593 -0.373
#> -0.477 0.0554 -8.61 <0.001 56.9 -0.586 -0.368
#> -0.471 0.0548 -8.59 <0.001 56.7 -0.578 -0.363
#> -0.403 0.0488 -8.26 <0.001 52.6 -0.499 -0.307
#> Term: gender (2-1)
#> Type: probs
nrow(pr3) # original data
#> [1] 2784
Average Contrast
Finally, we can get the average difference:
# average contrast
avg_sum_score.contr <- function(x) {
sum_score.contr(x) |>
summarise(
term = "avg. gender (2-1)",
estimate = mean(estimate)
)
}
predictions(fit, variables = "gender",
hypothesis = avg_sum_score.contr)
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> -0.474 0.0551 -8.61 <0.001 56.9 -0.582 -0.366
#>
#> Term: avg. gender (2-1)
#> Type: probs
POMP
POMP (percent of maximum possible) are a for of standardized units for Likert-type items (see Solomon Kurz’s excellent blog post for more – better – info).
These are linear transformations of the ranks described above:
We can obtain POMPs for each row in the counterfactual data frame by further processing the sum-scores:
POMP <- function(x) {
sum_score(x) |>
mutate(
estimate = 100 * (estimate - 1) / (6 - 1)
)
}
predictions(fit, variables = "gender",
hypothesis = POMP)
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> 40.4 1.166 34.6 <0.001 870.6 38.1 42.7
#> 30.2 0.902 33.5 <0.001 812.8 28.4 31.9
#> 39.4 1.107 35.6 <0.001 920.9 37.3 41.6
#> 29.3 0.829 35.3 <0.001 905.5 27.7 30.9
#> 39.9 1.135 35.1 <0.001 896.3 37.7 42.1
#> --- 5558 rows omitted. See ?print.marginaleffects ---
#> 24.7 0.609 40.6 <0.001 Inf 23.5 25.9
#> 33.3 0.945 35.3 <0.001 902.9 31.5 35.2
#> 23.9 0.608 39.3 <0.001 Inf 22.7 25.1
#> 25.2 1.289 19.5 <0.001 279.4 22.6 27.7
#> 17.1 0.914 18.7 <0.001 256.9 15.3 18.9
#> Term: sum score
#> Type: probs
And likewise get contrasts -
POMP.contr <- function(x) {
POMP(x) |>
summarise(
term = "gender (2-1)",
estimate = estimate[gender == "2"] - estimate[gender == "1"],
.by = rowidcf
)
}
predictions(fit, variables = "gender",
hypothesis = sum_score.contr)
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> -0.511 0.0588 -8.68 <0.001 57.8 -0.626 -0.395
#> -0.506 0.0584 -8.67 <0.001 57.7 -0.621 -0.392
#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394
#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394
#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394
#> --- 2774 rows omitted. See ?print.marginaleffects ---
#> -0.504 0.0582 -8.67 <0.001 57.6 -0.618 -0.390
#> -0.483 0.0560 -8.62 <0.001 57.1 -0.593 -0.373
#> -0.477 0.0554 -8.61 <0.001 56.9 -0.586 -0.368
#> -0.471 0.0548 -8.59 <0.001 56.7 -0.578 -0.363
#> -0.403 0.0488 -8.26 <0.001 52.6 -0.499 -0.307
#> Term: gender (2-1)
#> Type: probs
And average contrasts -
# average contrast
avg_POMP.contr <- function(x) {
POMP.contr(x) |>
summarise(
term = "avg. gender (2-1)",
estimate = mean(estimate)
)
}
predictions(fit, variables = "gender",
hypothesis = avg_POMP.contr)
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> -9.49 1.1 -8.61 <0.001 56.9 -11.6 -7.33
#>
#> Term: avg. gender (2-1)
#> Type: probs
A note for Bayesians
Hey, Bayesians, how are you doing?
I have some bad news: {marginaleffects} does not support hypothesis=<function>.
But I also have some good news! You can basically do all of this by getting the posterior draws in an rvar format, and then directly manipulating the posterior(s) - so the examples above should basically work out of the box.
For example:
library(brms)
library(posterior)
fit_b <- brm(A1 ~ age + gender,
data = bfi,
family = cumulative(),
prior = NULL) # obviously this is a bad prior
pr1_b <- predictions(fit_b, variables = "gender")
We just need to adapt the function(s) written above to work properly with rvars. I’ve marked
avg_POMP.contr_b <- function(x) {
x |>
arrange(rowidcf, group, gender) |>
# for each counterfactual row and level of gender
summarise(
term = "sum score",
# estimate = sum(estimate * (1:6)),
1 estimate = rvar_sum(rvar * (1:6)),
.by = c(rowidcf, gender)
) |>
# POMP
mutate(
estimate = 100 * (estimate - 1) / (6 - 1)
) |>
# contrasts
summarise(
term = "gender (2-1)",
estimate = estimate[gender == "2"] - estimate[gender == "1"],
.by = rowidcf
) |>
# average
summarise(
term = "avg. gender (2-1)",
# estimate = mean(estimate)
2 estimate = rvar_mean(estimate)
)
}
get_draws(pr1_b, shape = "rvar") |>
avg_POMP.contr_b()
#> term estimate
#> 1 avg. gender (2-1) -9.4 ± 1.1
- 1
-
Use the
rvarcolumn and thervar_sum()function (instead ofsum()) - 2
-
Use the
rvar_mean()function (instead ofmean())
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.