Site icon R-bloggers

Exploring Frequentist and Bayesian Tolerance Intervals in R

[This article was first published on [R]eliability, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • Tolerance intervals are used to specify coverage of a population of data. In the frequentist framework, the width of the interval is dependent on the desired coverage proportion and the specified confidence level. They are widely used in the medical device industry because they can be compared directly vs. product specifications, allowing the engineer to make a judgment about what percentage of the parts would meet the spec taking into account sampling uncertainty.

    In this post I wanted to dive a bit deeper into the frequentist version of tolerance intervals to verify that they provide the correct coverage in a straight-forward use case. Then I will explore a Bayesian version which uses the posterior draws from a fitted model to calculate the Bayesian analogue of a tolerance interval.

    Libraries

    library(tidyverse)
    library(here)
    library(brms)
    library(broom)
    library(tidybayes)
    library(tolerance)
    library(wesanderson)
    library(patchwork)
    library(gt)

    As with any frequentist simulation looking at coverage of a p-value, we can start with the population parameters known. From there we simulate many draws from that population and calculate the desired statistic (in this case the tolerance interval limits). By comparing each calculated tolerance interval to the true population coverage, we can determine the fraction of simulations to do not meet the specified coverage. If that number, as a fraction of the total number of sims, is less than or equal to the p-value, then the tolerance interval procedure is working as intended and provided the specified coverage, on average, over many simulated experiments.

    This block sets up the parameters and identifies the specified quantiles of the true population.

    Simulated Experiments

    True Population Parameters

    set.seed(9989)
    
    n_sims <- 10000
    true_population_mean <- 40
    true_population_sd <- 4
    alpha = 0.1
    p = .95
    
    upper_97.5 <- qnorm(p = .975, mean = true_population_mean, sd = true_population_sd)
    lower_02.5 <- qnorm(p = .025, mean = true_population_mean, sd = true_population_sd)

    Now run the simulation, replicating many “experiments” of n=15, calculating a 90% confidence, 2-sided tolerance interval for 95% of the population each time.

    Run the Simulation

    sim_summary_tbl <- tibble(sim_id = seq(from = 1, to = n_sims, by = 1)) %>%
      rowwise() %>%
      mutate(
        test_sample_size = 15,
        sim_test_data = list(rnorm(n = test_sample_size, mean = true_population_mean, sd = true_population_sd))
      ) %>%
      summarize(
        sim_id = sim_id,
    #   sim_test_data = sim_test_data,
        norm_tol = list(normtol.int(sim_test_data, alpha = alpha, P = p, side = 2)),
        sample_sd = sd(sim_test_data)
      ) %>%
      unnest(norm_tol) %>%
      rename(
        lower_bound_ltl = "2-sided.lower",
        upper_bound_utl = "2-sided.upper",
        sample_mean = x.bar,
        p = P
      ) %>%
      ungroup() %>%
      select(sim_id, alpha, p, sample_mean, sample_sd, everything()) %>%
      mutate(
        true_lower_2.5 = lower_02.5,
        true_upper_97.5 = upper_97.5
      )
    
    sim_summary_tbl %>%
      gt_preview()

    sim_id alpha p sample_mean sample_sd lower_bound_ltl upper_bound_utl true_lower_2.5 true_upper_97.5
    1 1 0.1 0.95 36.19320 3.243531 27.35493 45.03147 32.16014 47.83986
    2 2 0.1 0.95 40.56155 3.766514 30.29821 50.82489 32.16014 47.83986
    3 3 0.1 0.95 40.23154 4.268292 28.60091 51.86217 32.16014 47.83986
    4 4 0.1 0.95 40.24057 4.175918 28.86165 51.61949 32.16014 47.83986
    5 5 0.1 0.95 40.41020 3.880094 29.83737 50.98303 32.16014 47.83986
    6..9999
    10000 10000 0.1 0.95 38.69646 4.780010 25.67145 51.72147 32.16014 47.83986
    ## Visualize

    Plot the resulting upper and lower bounds of the 95/90 tolerance interval. The red lines are the true population quantiles.

    sim_summary_tbl %>% 
      pivot_longer(cols = everything(), names_to = "param", values_to = "value") %>%
      filter(param == "lower_bound_ltl" | param == "upper_bound_utl") %>%
      mutate(param = as_factor(param)) %>%
      ggplot(aes(x = value, y = param)) +
      geom_jitter(aes(color = param), width = .1, height = .1, size = .3, alpha = .05) +
      stat_halfeye(aes(fill = param), alpha = .7, position = position_nudge(y = .2)) +
      geom_vline(xintercept = mean(sim_summary_tbl$true_lower_2.5), color = "firebrick", linetype = 2) +
      geom_vline(xintercept = mean(sim_summary_tbl$true_upper_97.5), color = "firebrick", linetype = 2) +
      scale_color_manual(values = wes_palette("Moonrise2")) +
      scale_fill_manual(values = wes_palette("Moonrise2")) +
    #  scale_color_manual(values = c("purple", "limegreen")) +
    #  scale_fill_manual(values = c("purple", "limegreen")) +
      labs(
          x = "",
          y = "",
          title = "Distribution of Calculated Upper and Lower Tolerance Limits",
          subtitle = "10,000 Simulations of n=15 parts with 2-sided normal Tolerance Intervals",
          caption = "pink vertical lines represent true population quantiles, .025 and .975"
        ) +
      theme(legend.title = element_blank(),
            legend.position = "bottom") 

    To see if the tolerance interval procedure is providing the desired coverage, we have to look at each set of lower and upper limits from each simulation. The code below converts the calculated tolerance bounds from each sim into quantiles of the true population, then calculates the difference to determine the true population coverage of the simulated experiment. If the coverage is less than intended, a flag of 1 is assigned in the “unacceptable_coverage” col, else 0.

    Evaluate True Coverage of Each Simulated Experiment

    coverage_summary_tbl <- sim_summary_tbl %>%
      mutate(ltl_coverage_true = map_dbl(lower_bound_ltl, pnorm, true_population_mean, true_population_sd),
             utl_coverage_true = map_dbl(upper_bound_utl, pnorm, true_population_mean, true_population_sd)) %>%
      mutate(total_tl_coverage_true = utl_coverage_true - ltl_coverage_true) %>%
      mutate(unacceptable_coverage = case_when(total_tl_coverage_true >= p ~ 0,
                                             TRUE ~ 1)) %>%
      select(-c(alpha, p, true_lower_2.5, true_upper_97.5)) 
    
    coverage_summary_tbl %>%
      gt_preview()
    sim_id sample_mean sample_sd lower_bound_ltl upper_bound_utl ltl_coverage_true utl_coverage_true total_tl_coverage_true unacceptable_coverage
    1 1 36.19320 3.243531 27.35493 45.03147 0.0007854236 0.8957802 0.8949948 1
    2 2 40.56155 3.766514 30.29821 50.82489 0.0076447723 0.9965973 0.9889526 0
    3 3 40.23154 4.268292 28.60091 51.86217 0.0021875221 0.9984893 0.9963017 0
    4 4 40.24057 4.175918 28.86165 51.61949 0.0026797862 0.9981630 0.9954832 0
    5 5 40.41020 3.880094 29.83737 50.98303 0.0055321969 0.9969814 0.9914492 0
    6..9999
    10000 10000 38.69646 4.780010 25.67145 51.72147 0.0001703976 0.9983072 0.9981368 0

    Count False Positives

    For the coverage to be correct, the percentage of false positives (simulated experiments that covered less than p% of the true population) should be less than or equal to alpha (10%). That would mean less than .1(10000) = 1000 cases where the tolerance interval covered less than 95% of the true population.

    coverage_summary_tbl %>%
      summarize(
        false_positives = sum(unacceptable_coverage),
        false_positive_limit = alpha * n_sims
      ) %>%
      mutate(
        alpha_level = alpha,
        total_trials = n_sims,
        false_positive_rate = false_positives / n_sims,
        "tol_limit_procedure_working?" = case_when(
          false_positives < false_positive_limit ~ "heck_yes",
          TRUE ~ "boo"
        )) %>%
      select(total_trials, alpha_level, false_positive_limit, everything()) %>%
      gt_preview()
    total_trials alpha_level false_positive_limit false_positives false_positive_rate tol_limit_procedure_working?
    1 10000 0.1 1000 958 0.0958 heck_yes

    Because .0958 is less than .1 (10%) we can say that the tolerance interval procedure is capturing the desired population proportion (95%) with an acceptable false positive rate (<= 10%). Neat!

    Now let’s look at the Bayesian analogue.

    Bayesian 2-sided Tolerance Interval

    Load the Data

    The dataset and example used for this section is adapted from this excellent paper on coverage intervals.1 The values represent measurements of mass fraction of iron made by the National Institute of Standards and Technology (NIST). Units of measurement are percentages, or cg/g.

    w_tbl <- tibble(w = c(67.43, 66.97, 67.65, 66.84, 67.05, 66.57, 67.16, 68.3,
    67.01, 67.07, 67.23, 66.51, 66.46, 67.54, 67.09, 66.77)) %>%
      mutate(w_pct = w/100)

    In preparation for the Bayesian tolerance interval calculations we should run some prior predictive checks. I know Beta(1,1) is a flat prior on mu, but I am not experienced enough with the cauchy to mentally visualize what its parameters mean with respect to the outcome variable. I adjusted the scale parameter for sigma manually and did this visual check each time until I saw a nice cloud of curves of varying but reasonable widths at scale = .025. Bonus – it is so satisfying to make the code for these curves using map2 and dplyr in the prep and then ggplot for the vis – incredibly smooth, readable, and efficient. I remain grateful for these core packages, especially as I start to mess around with other languages.

    Prior Predictive Simulations

    set.seed(2005)
    
    prior_pred_tbl <- tibble(mu = rbeta(300, 1, 1),
           sig = rcauchy(300, 0, 0.025) %>% abs()) %>%
      mutate(row_id = row_number()) %>%
      select(row_id, everything()) %>%
      mutate(plotted_y_data = map2(mu, sig, ~tibble(
        x = seq(0, 1, length.out = 100),
        y = dnorm(x, .x, .y)))) %>%
      unnest() %>%
      mutate(model = "mu ~ beta(1,1), sigma ~ cauchy(0,0.025)")
    
    prior_pred_tbl %>%
      gt_preview()
    row_id mu sig x y model
    1 1 0.1675134 2.18178615 0.00000000 1.823131e-01 mu ~ beta(1,1), sigma ~ cauchy(0,0.025)
    2 1 0.1675134 2.18178615 0.01010101 1.823759e-01 mu ~ beta(1,1), sigma ~ cauchy(0,0.025)
    3 1 0.1675134 2.18178615 0.02020202 1.824349e-01 mu ~ beta(1,1), sigma ~ cauchy(0,0.025)
    4 1 0.1675134 2.18178615 0.03030303 1.824900e-01 mu ~ beta(1,1), sigma ~ cauchy(0,0.025)
    5 1 0.1675134 2.18178615 0.04040404 1.825412e-01 mu ~ beta(1,1), sigma ~ cauchy(0,0.025)
    6..29999
    30000 300 0.5856645 0.01596839 1.00000000 1.589627e-145 mu ~ beta(1,1), sigma ~ cauchy(0,0.025)
    j <- prior_pred_tbl %>%
      ggplot(aes(x = x, y = y, group = row_id)) +
      geom_line(aes(x, y), alpha = .5, color = "#2c3e50") +
      labs(
        x = "x",
        y = "dnorm(x)"
      ) +
      ylim(c(0, 10)) +
      theme_minimal() 
    
    k <- j +
      ylim(c(0, 50)) 
    
    (k + j) + plot_annotation(
      title = 'Prior Predictions: n=300 Possible Distibutions According to Priors:',
      subtitle = 'mu ~ Beta(1,1), sigma ~ cauchy(0, .025)',
      caption = "same data, 2 different y-axis scales")

    Fit the model with brms

    Use the priors that were just identified to fit a model in brms and visualize the output. The fuzzy caterpillar is not plotted because its very slow with to render with so many iterations. Don’t worry – it’s fuzzy.

    Non-Informative Priors

    w_mod <-
      brm(
        data = w_tbl, family = gaussian(),
        w_pct ~ 1,
       prior = c(
         prior(beta(1, 1), class = Intercept),
         prior(cauchy(0, .025), class = sigma)),
       iter = 400000, warmup = 50000, chains = 4, cores = 2,
       seed = 10
     )
    
    
    w_mod %>% summary()
    ##  Family: gaussian 
    ##   Links: mu = identity; sigma = identity 
    ## Formula: w_pct ~ 1 
    ##    Data: w_tbl (Number of observations: 16) 
    ## Samples: 4 chains, each with iter = 4e+05; warmup = 50000; thin = 1;
    ##          total post-warmup samples = 1400000
    ## 
    ## Population-Level Effects: 
    ##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
    ## Intercept     0.67      0.00     0.67     0.67 1.00   858299   757467
    ## 
    ## Family Specific Parameters: 
    ##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
    ## sigma     0.01      0.00     0.00     0.01 1.00   792388   740061
    ## 
    ## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
    ## and Tail_ESS are effective sample size measures, and Rhat is the potential
    ## scale reduction factor on split chains (at convergence, Rhat = 1).

    Peek at the posterior tbl for mu and sigma.

    w_post_tbl <-
      posterior_samples(w_mod) %>%
      select(-lp__) %>%
      rename("mu" = b_Intercept) 
    
    w_post_tbl %>%
      gt_preview()
    mu sigma
    1 0.6704184 0.004895014
    2 0.6685570 0.005720891
    3 0.6708761 0.003932762
    4 0.6713238 0.004217306
    5 0.6689520 0.005191162
    6..1399999
    1400000 0.6720105 0.005901202

    Calculate 2-Sided Bayesian Tolerance Interval from the Posterior – Non-Informative Priors

    We’ll calculate the 95/90 tolerance limit just like before. The Bayesian analogue involves doing a row-wise calculation for each posterior draw where the equivalent of a tolerance interval is calculated using the standard formula: x +/- ks, where x is mu and s is sigma. The k factor is taken as the .95 quantile of a unit normal distribution (thus excluding the top and bottom 5% for 90% coverage). Doing this for every row in the posterior produces a distribution of upper and lower tolerance bounds – 1 observation for each credible set of mu and sigma. The 5% “confidence” analogue is done by excluding the 2.5% most extreme values from the low end of the lower bound distribution and the 2.5% most extreme values from the high end of the upper bound distribution.

    bt_lower <- quantile(w_post_tbl$mu - qnorm(0.95) * w_post_tbl$sigma, 0.025)
    bt_upper <-  quantile(w_post_tbl$mu + qnorm(0.95) * w_post_tbl$sigma, 0.975)
    
    bt_tbl <- tibble(lower_coverage_limit = bt_lower * 100 %>% round(4),
           upper_coverage_limit = bt_upper * 100 %>% round(4),
           method = as_factor("Bayesian")) 
    
    bt_tbl %>%
      gt_preview()
    lower_coverage_limit upper_coverage_limit method
    1 65.75688 68.44929 Bayesian

    So the Bayesian version of a tolerance interval for those relatively vague priors is [65.84, 68.37] after back transforming the percentages to cg/g.

    The frequentist version for the same data is calculated as follows with the tolerance package (assuming normality of the experimental data):

    Frequentist 2-sided Tolerance Interval

    fr_tbl <- tidy(normtol.int(w_tbl$w, alpha = .05, P = .9, side = 2, method = "EXACT")) %>%
      select(column, mean) %>%
      pivot_wider(names_from = "column",
                  values_from = "mean") %>%
      select('2-sided.lower', '2-sided.upper') %>%
      mutate(method = as_factor("Frequentist")) %>%
      rename(lower_coverage_limit = '2-sided.lower',
             upper_coverage_limit = '2-sided.upper') %>%
      select(method, everything())

    Combining and comparing to the Bayesian version:

    fr_tbl %>% 
      bind_rows(bt_tbl) %>%
      mutate_if(is.numeric, round, digits = 2) %>%
      mutate(priors = c("non_informative (implied)", "non-informative")) %>%
      gt_preview()

    method lower_coverage_limit upper_coverage_limit priors
    1 Frequentist 65.95 68.25 non_informative (implied)
    2 Bayesian 65.76 68.45 non-informative
    These are quite close! The Bayesian version does remain bit wider. Perhaps with more iterations of MCMC or with slightly less vague priors, the Bayesian version would creep tighter until they converge.

    Plotting the upper and lower tolerance limits from the Bayesian version we see that the distribution (upper and lower bounds) are not normal.

    w_post_tbl %>%
      mutate(upper_95 = qnorm(p = .95, mean = mu, sd = sigma), 
             lower_95 = qnorm(p = .05, mean = mu, sd = sigma),
             id = as_factor("posterior")) %>%
      sample_n(size = 10000) %>%
      pivot_longer(cols = c(upper_95, lower_95), names_to = "quantile", values_to = "value") %>%
      mutate(quantile = as_factor(quantile)) %>%
      ggplot(aes(x = value, y = id)) +
      geom_jitter(aes(color = quantile), width = .001, height = .1, size = .3, alpha = .05) +
      stat_halfeye(aes(fill = quantile),  alpha = .7, position = position_nudge(y = .2)) +
      scale_color_manual(values = wes_palette("Moonrise2")) +
      scale_fill_manual(values = wes_palette("Moonrise2")) +
    #  scale_color_manual(values = c("purple", "limegreen")) +
    #  scale_fill_manual(values = c("purple", "limegreen")) +
      labs(
          x = "",
          y = "",
          title = "Distribution of Calculated Upper and Lower Tolerance Limits",
          subtitle = "Calculated using Bayesian Method",
          caption = "10,000 Predictions Sampled from Posterior (400,000 total)"
        ) +
      theme(legend.title = element_blank(),
            legend.position = "bottom") 

    But the upper and lower bounds come in pairs. Plotting the pairs

    set.seed(2345)
    
    w_post_tbl %>%
      mutate(upper_95 = qnorm(p = .95, mean = mu, sd = sigma), 
             lower_95 = qnorm(p = .05, mean = mu, sd = sigma)) %>%
      sample_n(10000) %>%
      ggplot(aes(y = lower_95, x = upper_95)) +
      geom_point(alpha = .1, color = "#2c3e50") +
      geom_hline(yintercept = bt_lower, linetype = 2) +
      geom_vline(xintercept = bt_upper, linetype = 2) +
      theme_minimal() %>%
      labs(
          x = "Predicted upper tolerance bound",
          y = "Predicted lower tolerance bound",
          title = "Credible Values for Upper and Lower 95% Bounds",
          subtitle = "Based on Credible Posterior Draws",
          caption = "10,000 Predictions Sampled from Posterior (400,000 total)"
        ) 

    Now we can calculate the number of tolerance bounds that are both:

    • Above (within) the lower 95% tolerance bound for 90% coverage
    • Below (within) the upper 95% tolerance bound for 90% coverage

    This number should be less than 500 for a 5% or less false positive rate, since this is a subset of 10,000 draws from the full posterior (which wouldn’t display well).

    set.seed(2345)
    
    w_post_tbl %>%
      mutate(upper_95 = qnorm(p = .95, mean = mu, sd = sigma), 
             lower_95 = qnorm(p = .05, mean = mu, sd = sigma)) %>%
      sample_n(10000) %>% 
      filter(lower_95 > bt_lower,
             upper_95 < bt_upper) %>%
      gt_preview()

    mu sigma upper_95 lower_95
    1 0.6734567 0.005124883 0.6818864 0.6650271
    2 0.6705619 0.004108660 0.6773200 0.6638037
    3 0.6704661 0.005681679 0.6798116 0.6611206
    4 0.6709579 0.003404294 0.6765575 0.6653584
    5 0.6706663 0.004703547 0.6784029 0.6629296
    6..9541
    9542 0.6713993 0.004295096 0.6784641 0.6643345
    9,542 rows means 458 false positives – this is acceptable coverage for the Bayesian interval.

    If you’ve made it this far, thank you for your attention.


    1. Stoudt S, Pintar A, Possolo A (2021) Coverage Intervals. J Res Natl Inst Stan 126:126004. https://doi.org/10.6028/jres.126.004. ↩︎

    To leave a comment for the author, please follow the link and comment on their blog: [R]eliability.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.