Working with Ordinal Ranks in {marginaleffects}

Posted on April 30, 2025 by Stat's What It's All About in R bloggers | 0 Comments

[This article was first published on Stat's What It's All About, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Given an ordinal regression model, it is relatively easy to get class-wise predictions – the conditional predicted probability of each level of the outcome. However, often, one might be interested in summarizing effects not on the class-wise probability scale (nor on the latent scale), but instead on the rank scale – the expected ordinal level, expressed as a single number.

The {emmeans} package has the mode = "mean.class" that does just this.

Let’s see how we can do this in {marginaleffects} by utilizing the powerful hypothesis= argument.

Fit a model

library(tidyverse)

library(marginaleffects)

bfi <- psych::bfi |> 
  mutate(
    gender = factor(gender),
    A1 = ordered(A1)
  ) |> 
  drop_na(A1, age, gender)

nrow(bfi)
#> [1] 2784

fit <- MASS::polr(A1 ~ age + gender, 
                  data = bfi,
                  Hess = TRUE)

Class-Wise Predictions

pr1 <- predictions(fit, variables = "gender")
pr1
#> 
#>  Group Estimate Std. Error     z Pr(>|z|)     S   2.5 % 97.5 %
#>      1   0.1800    0.01136 15.85   <0.001 185.5 0.15777 0.2023
#>      1   0.1886    0.01128 16.71   <0.001 205.9 0.16649 0.2107
#>      1   0.1843    0.01132 16.28   <0.001 195.6 0.16210 0.2065
#>      1   0.1843    0.01132 16.28   <0.001 195.6 0.16210 0.2065
#>      1   0.1843    0.01132 16.28   <0.001 195.6 0.16210 0.2065
#> --- 33398 rows omitted. See ?print.marginaleffects --- 
#>      6   0.0289    0.00335  8.64   <0.001 57.3 0.02234 0.0355
#>      6   0.0231    0.00264  8.78   <0.001 59.1 0.01798 0.0283
#>      6   0.0219    0.00250  8.76   <0.001 58.8 0.01699 0.0268
#>      6   0.0207    0.00238  8.71   <0.001 58.1 0.01604 0.0254
#>      6   0.0121    0.00165  7.35   <0.001 42.2 0.00891 0.0154
#> Type:  probs

nrow(pr1) # original data * 2 (counterfactual gender) * 6 (levels of A1)
#> [1] 33408

For each observation in the original data frame we now have 12 predictions:
- 2 counterfactual predictions for the two levels of gender, times
- 6 (arguably also counterfactual) predictions for each of the possible outcome levels.

Sum-scores (mean rank)

We can average the class ranks by their predicted probability to obtain “sum scores”:

And we can do this for each row in the counterfactual data frame (marked by the rowidcf variable):

sum_score <- function(x) {
  x |> 
    arrange(rowidcf, group, gender) |> 
    # for each counterfactual row and level of gender
    summarise(
      term = "sum score",
      estimate = sum(estimate * (1:6)), 
      
      .by = c(rowidcf, gender)
    )
}

pr2 <- predictions(fit, variables = "gender", 
                   hypothesis = sum_score)
pr2
#> 
#>  Estimate Std. Error    z Pr(>|z|)   S 2.5 % 97.5 %
#>      3.02     0.0583 51.8   <0.001 Inf  2.90   3.13
#>      2.51     0.0451 55.6   <0.001 Inf  2.42   2.60
#>      2.97     0.0553 53.7   <0.001 Inf  2.86   3.08
#>      2.46     0.0415 59.4   <0.001 Inf  2.38   2.55
#>      3.00     0.0568 52.8   <0.001 Inf  2.88   3.11
#> --- 5558 rows omitted. See ?print.marginaleffects --- 
#>      2.24     0.0305 73.4   <0.001   Inf  2.18   2.30
#>      2.67     0.0473 56.4   <0.001   Inf  2.57   2.76
#>      2.20     0.0304 72.2   <0.001   Inf  2.14   2.26
#>      2.26     0.0645 35.0   <0.001 890.8  2.13   2.38
#>      1.85     0.0457 40.6   <0.001   Inf  1.77   1.94
#> Term: sum score
#> Type:  probs

nrow(pr2) # original data * 2 (counterfactual gender)
#> [1] 5568

As expected, we now have 2 predictions for each observation in the original data frame: - 2 counterfactual predictions for the two levels of gender

Note that we could have also computed the mean rank for the two levels of gender, or literally anything else. But here I am trying to mimic the basic behavior of the avg_/comparisons() functions by keeping with the workflow:

Compute observation wise counterfactual predictions
Contrast them for each observation
(Possibly) average them

So let’s get those contrasts!

Contrast the ranks

sum_score.contr <- function(x) {
  # same as before, and...
  sum_score(x) |> 
    # compute a single contrast for each counterfactual row
    summarise(
      term = "gender (2-1)",
      estimate = estimate[gender == "2"] - estimate[gender == "1"], 
      
      .by = rowidcf
    )
}

This function will compute a difference between the rank for each gender for each observation.

pr3 <- predictions(fit, variables = "gender", 
                   hypothesis = sum_score.contr)
pr3
#> 
#>  Estimate Std. Error     z Pr(>|z|)    S  2.5 % 97.5 %
#>    -0.511     0.0588 -8.68   <0.001 57.8 -0.626 -0.395
#>    -0.506     0.0584 -8.67   <0.001 57.7 -0.621 -0.392
#>    -0.509     0.0586 -8.68   <0.001 57.8 -0.623 -0.394
#>    -0.509     0.0586 -8.68   <0.001 57.8 -0.623 -0.394
#>    -0.509     0.0586 -8.68   <0.001 57.8 -0.623 -0.394
#> --- 2774 rows omitted. See ?print.marginaleffects --- 
#>    -0.504     0.0582 -8.67   <0.001 57.6 -0.618 -0.390
#>    -0.483     0.0560 -8.62   <0.001 57.1 -0.593 -0.373
#>    -0.477     0.0554 -8.61   <0.001 56.9 -0.586 -0.368
#>    -0.471     0.0548 -8.59   <0.001 56.7 -0.578 -0.363
#>    -0.403     0.0488 -8.26   <0.001 52.6 -0.499 -0.307
#> Term: gender (2-1)
#> Type:  probs

nrow(pr3) # original data
#> [1] 2784

Average Contrast

Finally, we can get the average difference:

# average contrast 
avg_sum_score.contr <- function(x) {
  sum_score.contr(x) |> 
    summarise(
      term = "avg. gender (2-1)",
      estimate = mean(estimate)
    )
}

predictions(fit, variables = "gender", 
            hypothesis = avg_sum_score.contr)
#> 
#>  Estimate Std. Error     z Pr(>|z|)    S  2.5 % 97.5 %
#>    -0.474     0.0551 -8.61   <0.001 56.9 -0.582 -0.366
#> 
#> Term: avg. gender (2-1)
#> Type:  probs

POMP

POMP (percent of maximum possible) are a for of standardized units for Likert-type items (see Solomon Kurz’s excellent blog post for more – better – info).

These are linear transformations of the ranks described above:

We can obtain POMPs for each row in the counterfactual data frame by further processing the sum-scores:

POMP <- function(x) {
  sum_score(x) |> 
    mutate(
      estimate = 100 * (estimate - 1) / (6 - 1)
    )
}

predictions(fit, variables = "gender", 
            hypothesis = POMP)
#> 
#>  Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
#>      40.4      1.166 34.6   <0.001 870.6  38.1   42.7
#>      30.2      0.902 33.5   <0.001 812.8  28.4   31.9
#>      39.4      1.107 35.6   <0.001 920.9  37.3   41.6
#>      29.3      0.829 35.3   <0.001 905.5  27.7   30.9
#>      39.9      1.135 35.1   <0.001 896.3  37.7   42.1
#> --- 5558 rows omitted. See ?print.marginaleffects --- 
#>      24.7      0.609 40.6   <0.001   Inf  23.5   25.9
#>      33.3      0.945 35.3   <0.001 902.9  31.5   35.2
#>      23.9      0.608 39.3   <0.001   Inf  22.7   25.1
#>      25.2      1.289 19.5   <0.001 279.4  22.6   27.7
#>      17.1      0.914 18.7   <0.001 256.9  15.3   18.9
#> Term: sum score
#> Type:  probs

And likewise get contrasts -

POMP.contr <- function(x) {
  POMP(x) |> 
    summarise(
      term = "gender (2-1)",
      estimate = estimate[gender == "2"] - estimate[gender == "1"], 
      
      .by = rowidcf
    )
}

predictions(fit, variables = "gender", 
            hypothesis = sum_score.contr)
#> 
#>  Estimate Std. Error     z Pr(>|z|)    S  2.5 % 97.5 %
#>    -0.511     0.0588 -8.68   <0.001 57.8 -0.626 -0.395
#>    -0.506     0.0584 -8.67   <0.001 57.7 -0.621 -0.392
#>    -0.509     0.0586 -8.68   <0.001 57.8 -0.623 -0.394
#>    -0.509     0.0586 -8.68   <0.001 57.8 -0.623 -0.394
#>    -0.509     0.0586 -8.68   <0.001 57.8 -0.623 -0.394
#> --- 2774 rows omitted. See ?print.marginaleffects --- 
#>    -0.504     0.0582 -8.67   <0.001 57.6 -0.618 -0.390
#>    -0.483     0.0560 -8.62   <0.001 57.1 -0.593 -0.373
#>    -0.477     0.0554 -8.61   <0.001 56.9 -0.586 -0.368
#>    -0.471     0.0548 -8.59   <0.001 56.7 -0.578 -0.363
#>    -0.403     0.0488 -8.26   <0.001 52.6 -0.499 -0.307
#> Term: gender (2-1)
#> Type:  probs

And average contrasts -

# average contrast 
avg_POMP.contr <- function(x) {
  POMP.contr(x) |> 
    summarise(
      term = "avg. gender (2-1)",
      estimate = mean(estimate)
    )
}

predictions(fit, variables = "gender", 
            hypothesis = avg_POMP.contr)
#> 
#>  Estimate Std. Error     z Pr(>|z|)    S 2.5 % 97.5 %
#>     -9.49        1.1 -8.61   <0.001 56.9 -11.6  -7.33
#> 
#> Term: avg. gender (2-1)
#> Type:  probs

A note for Bayesians

Hey, Bayesians, how are you doing?

I have some bad news: {marginaleffects} does not support hypothesis=<function>.

But I also have some good news! You can basically do all of this by getting the posterior draws in an rvar format, and then directly manipulating the posterior(s) - so the examples above should basically work out of the box.

For example:

library(brms)
library(posterior)

fit_b <- brm(A1 ~ age + gender, 
             data = bfi,
             family = cumulative(),
             prior = NULL) # obviously this is a bad prior

pr1_b <- predictions(fit_b, variables = "gender")

We just need to adapt the function(s) written above to work properly with rvars. I’ve marked

avg_POMP.contr_b <- function(x) {
  x |> 
    arrange(rowidcf, group, gender) |> 
    # for each counterfactual row and level of gender
    summarise(
      term = "sum score",
      # estimate = sum(estimate * (1:6)), 
1      estimate = rvar_sum(rvar * (1:6)),
      
      .by = c(rowidcf, gender)
    ) |> 
    # POMP
    mutate(
      estimate = 100 * (estimate - 1) / (6 - 1)
    ) |> 
    # contrasts
    summarise(
      term = "gender (2-1)",
      estimate = estimate[gender == "2"] - estimate[gender == "1"], 
      
      .by = rowidcf
    ) |> 
    # average
    summarise(
      term = "avg. gender (2-1)",
      # estimate = mean(estimate)
2      estimate = rvar_mean(estimate)
    )
}

get_draws(pr1_b, shape = "rvar") |>
  avg_POMP.contr_b()
#>                term   estimate
#> 1 avg. gender (2-1) -9.4 ± 1.1

1: Use the rvar column and the rvar_sum() function (instead of sum())
2: Use the rvar_mean() function (instead of mean())

To leave a comment for the author, please follow the link and comment on their blog: Stat's What It's All About.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Working with Ordinal Ranks in {marginaleffects}

Fit a model

Class-Wise Predictions

Sum-scores (mean rank)

Contrast the ranks

Average Contrast

POMP

A note for Bayesians

Related

Fit a model

Class-Wise Predictions

Sum-scores (mean rank)

Contrast the ranks

Average Contrast

POMP

A note for Bayesians

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)