Site icon R-bloggers

Warspeed 5 — priors and models continued

[This article was first published on Posts on R Lover ! a programmer, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • My last 4 posts have all focused on the vaccines being produced to fight COVID-19. They have primarily focused on Bayesian methods (or at least comparing bayesian to frequentist methods). This one follows that pattern and provides expanded coverage of the concept of priors in bayesian thinking, how to operationalize them, and additional coverage of how to compare bayesian regression models using various tools in r.

    There’s no actual additional analysis of “real data”. While the news has all been good lately I haven’t found anything publicly available that begs for investigation. Instead we’ll use “realistic” but not real data, and let the process unfold for us.

    I will, however, try to show you some tools and tricks in r to make your analysis of any data easier and smoother.

    What bayesians do

    First off let me remind the reader that like most I was educated and trained in the frequentist tradition. I’m not against frequentist methods and even admit in the vast majority of cases they lead to similar conclusions. But, I do believe the bayesian methods are better and let you answer questions you really want to research. Now a quote from a great book, which also happens to be available at no cost.

    From a Bayesian perspective, statistical inference is all about belief revision. I start out with a set of candidate hypotheses about the world. I don’t know which of these hypotheses is true, but do I have some beliefs about which hypotheses are plausible and which are not. When I observe the data, I have to revise those beliefs. If the data are consistent with a hypothesis, my belief in that hypothesis is strengthened. If the data is inconsistent with the hypothesis, my belief in that hypothesis is weakened. Navarro, page 555.

    As a formula you’ll see Bayes Theorem expressed something like this:

    \[P(hypothesis | data) = \frac{P(data | hypothesis) * P(hypothesis) }{P(data)}\]

    We have data, and we have hypotheses, we have bayesian tools, most “new” bayesians seem to stumble in the process of making good on our declaration of what our priors are. Up until today’s post I’ve mainly avoided the topic or briefly touched on and used terms like “vague”, “uninformed” or “flat”. See for example the very flat and uninformed plot from my second blog post in the series.

    It’s okay in a bayesian framework to have priors that are vague if that’s what you really know and what you really mean. Just don’t use them when you know “more” or “better”. There is a real difference “between this value can be literally anything” and “this value we are studying has some prior knowledge behind it and some likely spread of values”.

    And, no, you are absolutely NOT guaranteeing success or dooming failure if you are off on your priors. Obviously, as a scientist, you should pick values that make sense and not pluck them from thin air or make them up, but rest assured given sufficient data even priors that are “off” are adjusted by the data. Isn’t that the point after all? Develop a hypothesis and test it with data? Notice that in bayesian inference we don’t need to force a “null hypothesis” that we “reject”.

    Come let us reason together

    At this point if you’re not comfortable with bayesian inference I probably haven’t helped you at all and it all probably feels a bit theoretical. Let’s take our current vaccine problem and work it through.

    Step 1 (what do we think we know or believe) about the vaccine and COVID-19
    1. A vast amount of time, money, and science have gone into developing these vaccines, and in many cases they are products of years of previous research – if not about COVID-19 then at least other viruses. The candidate vaccines have already been tried in Phase I and Phase II in perhaps animals or in limited numbers of humans. They got to phase III not by being unknown or “anything is possible” but rather on a belief they are going to be at least 50% effective. So we know something about the direction and with less certainty something about the magnitude.

    2. We know given public data that older people tend to contract COVID-19 less. All the data points to much higher infection rates in younger populations. Good thing too since the older population tends to have much more severe outcomes. Notice neither #1 or #2 says anything about how effective the vaccine is in older populations, we’ll get to that in a minute. But we certainly have evidence that currently older folks are less likely to get it. Notice reasonable people can disagree. It’s okay if you have scientific prior evidence that makes you think otherwise. Express it! Let the data inform us.

    3. There is evidence that vaccines can be less effective for older populations. It is certainly a worry for scientists going into the trial. Just as there are worries for other health risk factors and potentially race and ethnicity and even gender as well. Chuck’s hypothesis is that the vaccine will be slightly less effective for older folks. But I’m prepared to be wrong and believe this less firmly than I believe the vaccine is effective.

    Step 2 How do we make use of our prior knowledge?

    In the last post we used brms to build a bayesian equivalent of a glm model. That’s our plan again today. So how do we go from my theoretical hypotheses in 1-3 above to something we can feed into r and let it chew on and inform us with our data? This is often the hard part for newcomers to bayesian inference. But it really needn’t be if you focus on the basics. Our plan is to feed our hypotheses into a generalized linear model. As with a the simpler case of modeling with lm we’ll produce a model that gives us slope estimates (the venerable \(\beta\) or \(\hat{b}_n\) coefficients). In traditional frequentist output these are in turn tested against the hypothesis that they are zero and a pvalue is generated. Zero slope means that predictor has zero ability to help us predict outcome in \(y = \hat{b}_1 + \hat{b}_0\) if \(\hat{b}_1 = 0\) then \(y = \hat{b}_0\) the intercept and \(\hat{b}_1\) can be eliminated.

    So our priors need to be converted to something that makes sense as a value for \(\hat{b}_n\). No effect equals zero and bigger effects mean larger positive or negative values. Notice that our estimates have a standard error which means they have a standard deviation which will serve as a way of expressing our “confidence” about our priors. A small value means we’re more sure that the effect will be tightly centered, larger SD means we’re less certain.

    So going back to 1-3 above let’s proceed with our priors one by one.

    1. Vaccine effect – So we know something about the direction and with less certainty something about the magnitude. We could pick from a large number of possible distributions for our \(\hat{b}_n\) here but hopefully you’re thinking by now this sounds like a normal or t distribution with an appropriate sd and mean. With small amounts of data t is possibly a better choice, but remember our data will inform us we are NOT fixing the outcome. So let’s set a prior for \(\hat{b}_{conditionvaccinated}\) with the mean at -2.0 and the sd of .5. Why a minus sign? That’s because \(\hat{b}_{conditionvaccinated}\) compares to the placebo condition and therefore the number of people who got COVID-19 in the trial will be less. The .5 sd is because we think the vaccine will be at least moderately effective. No worries later I’ll show you how to be more specific in your hypotheses. You’ll see this later but for brms and STAN we express this as prior(normal(-2.0, .5), class = b, coef = conditionvaccinated)

    2. Older folks are less likely to get COVID-19. Remember this is a general statement about older folks regardless of whether they received the placebo or the vaccine. This difference is less pronounced and less certain so we’ll set the mean closer to zero at -.5 and increase the standard deviation to 1 to indicate we believe it’s more variable and that is expressed as prior(normal(-0.5, 1), class = b, coef = ageOlder).

    3. Chuck’s hypothesis is that the vaccine will be slightly less effective for older folks. Which means that this time the sign will be positive because the interaction ageOlder:conditionvaccinated will yield slightly higher infection rates and therefore \(\hat{b}_{ageOlder:conditionvaccinated}\) will be a small positive number. But I’m less firm in this belief so once again standard deviation = 1 and the code is prior(normal(.5, 1), class = b, coef = ageOlder:conditionvaccinated)

    Now, we have reasonable informed priors! And hopefully you have a better idea on how to set them in a regression model of interest to you. Remember you can consult the STAN documentation for more suggestions on informative priors. We can run get_prior to get back information about the defaults. Notice we don’t really care about the overall intercept \(\hat{b}_0\).

    Load libraries and on to the data

    Load the necessary libraries as before

    library(dplyr)
    library(brms)
    library(ggplot2)
    library(kableExtra)
    theme_set(theme_bw())

    Load the same data as last time, this time I’m providing it as a structure for your ease. Again you could have your data in full long format and accomplish the same modeling I’m providing it wide and summarized for computational speed and leaving you the dplyr commands to go long if you like.

    agg_moderna_data <- 
       structure(list(
          condition = structure(c(1L, 1L, 2L, 2L), .Label = c("placebo", "vaccinated"), class = "factor"), 
          age = structure(c(1L, 2L, 1L, 2L), .Label = c("Less than 65", "Older"), class = "factor"),
          didnot = c(9900L, 993L, 9990L, 999L), 
          got_covid = c(100L, 7L, 10L, 1L), 
          subjects = c(10000L, 1000L, 10000L, 1000L)), 
          class = "data.frame", row.names = c(NA, -4L))
    
    # if you need to convert to long format use this...
    
    # agg_moderna_data %>% 
    #    tidyr::pivot_longer(cols = c(didnot, got_covid), 
    #                        names_to = "Outcome") %>%
    #    select(-subjects) %>%
    #    tidyr::uncount(weights = value) %>%
    #    mutate(Outcome = factor(Outcome))
    
    # Make a nice table for viewing
    agg_moderna_data %>%
      kbl(digits = c(0,0,0,0,0),
          caption = "Notional Data") %>%
      kable_minimal(full_width = FALSE,
          position = "left") %>%
      add_header_above(c("Factors of interest" = 2, 
                         "COVID Infection" = 2, 
                         "Total subjects" = 1))
    Table 1: Notional Data
    Factors of interest
    COVID Infection
    Total subjects
    condition age didnot got_covid subjects
    placebo Less than 65 9900 100 10000
    placebo Older 993 7 1000
    vaccinated Less than 65 9990 10 10000
    vaccinated Older 999 1 1000

    Once again I’m assuming there are fewer older subjects in the trials.

    Repeating the frequentist GLM solution

    So we’ll create a matrix of outcomes called outcomes with successes got_covid and failures didnot. For binomial and quasibinomial families the response can be specified … as a two-column matrix with the columns giving the numbers of successes and failures

    outcomes <- cbind(as.matrix(agg_moderna_data$got_covid, ncol = 1), 
                 as.matrix(agg_moderna_data$didnot, ncol = 1))
    
    moderna_frequentist <-
       glm(formula = outcomes ~ age + condition + age:condition,
           data = agg_moderna_data,
           family = binomial(link = "logit"))
    
    # moderna_frequentist glm results
    broom::tidy(moderna_frequentist)
    ## # A tibble: 4 x 5
    ##   term                         estimate std.error statistic  p.value
    ##   <chr>                           <dbl>     <dbl>     <dbl>    <dbl>
    ## 1 (Intercept)                    -4.60      0.101   -45.7   0.      
    ## 2 ageOlder                       -0.360     0.392    -0.917 3.59e- 1
    ## 3 conditionvaccinated            -2.31      0.332    -6.96  3.32e-12
    ## 4 ageOlder:conditionvaccinated    0.360     1.12      0.321 7.48e- 1
    ### In case you want to plot the results
    # sjPlot::plot_model(moderna_frequentist, 
    #                    type = "int")

    Repeating the bayesian solution with no priors given

    moderna_bayes_full <- 
       brm(data = agg_moderna_data,
           family = binomial(link = logit),
           got_covid | trials(subjects) ~ age + condition + age:condition,
           iter = 12500, 
           warmup = 500, 
           chains = 4, 
           cores = 4,
           seed = 9,
           file = "moderna_bayes_full")
    
    summary(moderna_bayes_full)
    ##  Family: binomial 
    ##   Links: mu = logit 
    ## Formula: got_covid | trials(subjects) ~ age + condition + age:condition 
    ##    Data: agg_moderna_data (Number of observations: 4) 
    ## Samples: 4 chains, each with iter = 12500; warmup = 500; thin = 1;
    ##          total post-warmup samples = 48000
    ## 
    ## Population-Level Effects: 
    ##                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
    ## Intercept                       -4.60      0.10    -4.80    -4.41 1.00    46428
    ## ageOlder                        -0.41      0.40    -1.27     0.33 1.00    26886
    ## conditionvaccinated             -2.35      0.34    -3.06    -1.73 1.00    27428
    ## ageOlder:conditionvaccinated     0.05      1.29    -2.90     2.17 1.00    18097
    ##                              Tail_ESS
    ## Intercept                       37933
    ## ageOlder                        23100
    ## conditionvaccinated             24161
    ## ageOlder:conditionvaccinated    15000
    ## 
    ## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
    ## and Tail_ESS are effective sample size measures, and Rhat is the potential
    ## scale reduction factor on split chains (at convergence, Rhat = 1).

    Priors and Bayes Factors

    Earlier I showed you what priors we want. We can run get_prior to get back information about the defaults.

    get_prior(data = agg_moderna_data,
           family = binomial(link = logit),
           got_covid | trials(subjects) ~ age + condition + age:condition)
    ##                 prior     class                         coef group resp dpar
    ##                (flat)         b                                             
    ##                (flat)         b                     ageOlder                
    ##                (flat)         b ageOlder:conditionvaccinated                
    ##                (flat)         b          conditionvaccinated                
    ##  student_t(3, 0, 2.5) Intercept                                             
    ##  nlpar bound       source
    ##                   default
    ##              (vectorized)
    ##              (vectorized)
    ##              (vectorized)
    ##                   default

    Right now they are all flat. Let’s create an object called my_priors that contains what we want them to be by model parameter. Then rerun and put the results into full_model. Finally let’s plot the prior’s and posteriors by parameter.

    my_priors <- 
       c(prior(normal(-0.5, 1), class = b, coef = ageOlder),
         prior(normal(-2.0, .5), class = b, coef = conditionvaccinated),
         prior(normal(.5, 1), class = b, coef = ageOlder:conditionvaccinated))
    
    my_priors
    ##            prior class                         coef group resp dpar nlpar bound
    ##  normal(-0.5, 1)     b                     ageOlder                            
    ##  normal(-2, 0.5)     b          conditionvaccinated                            
    ##   normal(0.5, 1)     b ageOlder:conditionvaccinated                            
    ##  source
    ##    user
    ##    user
    ##    user
    full_model <-
       brm(data = agg_moderna_data,
           family = binomial(link = logit),
           got_covid | trials(subjects) ~ age + condition + age:condition,
           prior = my_priors,
           iter = 12500,
           warmup = 500,
           chains = 4,
           cores = 4,
           seed = 9,
           save_pars = save_pars(all = TRUE),
           file = "full_model")
    summary(full_model)
    ##  Family: binomial 
    ##   Links: mu = logit 
    ## Formula: got_covid | trials(subjects) ~ age + condition + age:condition 
    ##    Data: agg_moderna_data (Number of observations: 4) 
    ## Samples: 4 chains, each with iter = 12500; warmup = 500; thin = 1;
    ##          total post-warmup samples = 48000
    ## 
    ## Population-Level Effects: 
    ##                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
    ## Intercept                       -4.61      0.10    -4.80    -4.42 1.00    54540
    ## ageOlder                        -0.42      0.36    -1.17     0.25 1.00    35067
    ## conditionvaccinated             -2.24      0.26    -2.77    -1.75 1.00    36868
    ## ageOlder:conditionvaccinated     0.35      0.72    -1.12     1.69 1.00    31410
    ##                              Tail_ESS
    ## Intercept                       37706
    ## ageOlder                        27560
    ## conditionvaccinated             28006
    ## ageOlder:conditionvaccinated    28442
    ## 
    ## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
    ## and Tail_ESS are effective sample size measures, and Rhat is the potential
    ## scale reduction factor on split chains (at convergence, Rhat = 1).
    plot(bayestestR::bayesfactor_parameters(full_model, null = c(-.5, .5)))

    # bayestestR::sensitivity_to_prior(full_model, index = "Median", magnitude = 10)

    With “good” priors we can compare models

    Now that we have convinced ourselves theoretically and by running the full model that we have reasonable and informative priors we can begin the process of comparing models. First step is to build the other models by removing parameters and their associated priors. Hopefully my naming scheme makes it clear what we’re doing. The obvious and elegant thing to do would be to write a simple purrr::map statement that would build a list of models but for clarity I’ll simply repeat the same steps repeatedly (if you want the purrr statement add a comment in disqus). I’ll spare you printing all the various model summaries.

    # no priors here
    null_model <-
       brm(data = agg_moderna_data,
           family = binomial(link = logit),
           got_covid | trials(subjects) ~ 1,
           iter = 12500,
           warmup = 500,
           chains = 4,
           cores = 4,
           seed = 9,
           save_pars = save_pars(all = TRUE),
           file = "null_model")
    
    # no prior for the interaction term
    my_priors2 <- c(prior(normal(-0.5, 1), class = b, coef = ageOlder),
                prior(normal(-2.0, .5), class = b, coef = conditionvaccinated))
    
    no_interaction <-
       brm(data = agg_moderna_data,
           family = binomial(link = logit),
           got_covid | trials(subjects) ~ age + condition,
           prior = my_priors2,
           iter = 12500,
           warmup = 500,
           chains = 4,
           cores = 4,
           seed = 9,
           save_pars = save_pars(all = TRUE),
           file = "no_interaction")
    # summary(no_interaction)
    
    # No prior for interaction or vaccine
    my_priors3 <- c(prior(normal(-0.5, 1), class = b, coef = ageOlder))
    
    no_vaccine <-
       brm(data = agg_moderna_data,
           family = binomial(link = logit),
           got_covid | trials(subjects) ~ age,
           prior = my_priors3,
           iter = 12500,
           warmup = 500,
           chains = 4,
           cores = 4,
           seed = 9,
           save_pars = save_pars(all = TRUE),
           file = "no_vaccine")
    # summary(no_vaccine)
    
    # no prior for interaction or for age
    my_priors4 <- c(prior(normal(-2.0, .5), class = b, coef = conditionvaccinated))
    
    no_age <-
       brm(data = agg_moderna_data,
           family = binomial(link = logit),
           got_covid | trials(subjects) ~ condition,
           prior = my_priors4,
           iter = 12500,
           warmup = 500,
           chains = 4,
           cores = 4,
           seed = 9,
           save_pars = save_pars(all = TRUE),
           file = "no_age")
    # summary(no_age)

    Okay, we now have models named full_model, no_interaction, no_age, no_vaccine, and null_model. What we’d like to know is which is the “best” model, where best is a balance between accuracy and parsimony. Highly predictive but doesn’t include any unnecessary predictors. There are a number of ways to accomplish this including AIC and BIC and even comparing R squared but we’re going to focus on comparing Bayes Factors.

    Using bayestestR::bayesfactor_models we’ll compare all our models to the null_model. Then since as usual the null_model is a rather silly point of comparison we’ll use update to change our comparison point to the model that does NOT include the interaction term (a common question in research). That will help us understand whether we need to worry about whether the vaccine is less effective in older subjects. To make your workflow easier I actually recommend skipping this manual update process and going straight to bayestestR::bayesfactor_inclusion(comparison, match_models = TRUE).

    comparison <- 
       bayestestR::bayesfactor_models(full_model, no_interaction, no_age, no_vaccine,
                                      denominator = null_model)
    comparison
    ## # Bayes Factors for Model Comparison
    ## 
    ##   Model                                      BF
    ##   [1] age + condition + age:condition 7.352e+18
    ##   [2] age + condition                 9.708e+18
    ##   [3] condition                       1.991e+19
    ##   [4] age                                 0.486
    ## 
    ## * Against Denominator: [5] (Intercept only)
    ## *   Bayes Factor Type: marginal likelihoods (bridgesampling)
    update(comparison, reference = 2)
    ## # Bayes Factors for Model Comparison
    ## 
    ##   Model                                      BF
    ##   [1] age + condition + age:condition     0.757
    ##   [3] condition                           2.051
    ##   [4] age                             5.006e-20
    ##   [5] (Intercept only)                1.030e-19
    ## 
    ## * Against Denominator: [2] age + condition
    ## *   Bayes Factor Type: marginal likelihoods (bridgesampling)
    bayestestR::bayesfactor_inclusion(comparison, match_models = TRUE)
    ## # Inclusion Bayes Factors (Model Averaged)
    ## 
    ##               Pr(prior) Pr(posterior) Inclusion BF
    ## age                 0.4         0.263        0.488
    ## condition           0.4         0.801    1.993e+19
    ## age:condition       0.2         0.199        0.757
    ## 
    ## * Compared among: matched models only
    ## *    Priors odds: uniform-equal
    ### This is possible but NOT recommended
    # bayestestR::bayesfactor_inclusion(comparison)

    There is very little evidence that we should include age or the age:condition interaction terms in our model. If anything we have modest evidence that we should exclude them! Weak evidence to be sure but evidence nonetheless (remember bayesian inference allows for “supporting” the null, not just rejecting it).

    More elegant testing (order restriction)

    Notice that our priors are unrestricted – that is, our \(\hat{b}_n\) parameters in the model have some non-zero credibility all the way out to infinity (no matter how small; this is true for both the prior and posterior distribution). Does it make sense to let our priors cover all of these possibilities? Our priors can be formulated as restricted priors (Morey, 2015; Morey & Rouder, 2011).

    By testing these restrictions on prior and posterior samples, we can see how the probabilities of the restricted distributions change after observing the data. This can be achieved with bayesfactor_restricted(), that computes a Bayes factor for these restricted model vs the unrestricted model.

    Think or it as more precise hypothesis testing. We need to specify these restrictions as logical conditions. An easy one is that I’m very confident that the vaccine is highly effective and \(\hat{b}_{conditionvaccinated}\) should be no where near 0. As a matter of research I want to know how the data support the notion that is less than -2 or a very large effect. How does that belief square with our data? More complex but testable, is that \(\hat{b}_{ageOlder.conditionvaccinated}\) (the interaction) if it is non zero is quite modest. The vaccine may work a little better or a little worse on older people but it is not a big difference.

    chuck_hypotheses1 <- 
       c("b_conditionvaccinated < -2")
    
    bayestestR::bayesfactor_restricted(full_model, hypothesis = chuck_hypotheses1)
    ## # Bayes Factor (Order-Restriction)
    ## 
    ##                  Hypothesis P(Prior) P(Posterior)    BF
    ##  b_conditionvaccinated < -2    0.497        0.822 1.654
    ## 
    ## * Bayes factors for the restricted model vs. the un-restricted model.
    chuck_hypotheses2 <- 
       c("b_conditionvaccinated < -2",
         "(b_ageOlder.conditionvaccinated > -0.5) & (b_ageOlder.conditionvaccinated < 0.5)"
       )
    bayestestR::bayesfactor_restricted(full_model, hypothesis = chuck_hypotheses2)
    ## # Bayes Factor (Order-Restriction)
    ## 
    ##                                                                        Hypothesis
    ##                                                        b_conditionvaccinated < -2
    ##  (b_ageOlder.conditionvaccinated > -0.5) & (b_ageOlder.conditionvaccinated < 0.5)
    ##  P(Prior) P(Posterior)    BF
    ##     0.499        0.822 1.648
    ##     0.340        0.444 1.305
    ## 
    ## * Bayes factors for the restricted model vs. the un-restricted model.

    Although the support is limited (remember we had very few older subjects who got COVID-19) we at least have limited support and as our numbers continue to expand over the coming months we can continue to watch the support (hopefully grow).

    Done

    Hope you found this continuation of the series helpful. As always feel free to comment. Personally I find it helpful to work on real world problems and if you can follow these methods you’re in great shape in the next few months as more data becomes available to build even more complex models.

    Keep counting the votes! Every last one of them!

    Chuck

    This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

    To leave a comment for the author, please follow the link and comment on their blog: Posts on R Lover ! a programmer.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.