Gamma Process Prior for Semiparametric Survival Analysis
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Heads up: equations may not render on blog aggregation sites. See original post here for good formatting. If you like this post, you can follow me on twitter.
Motviation
Suppose we observe survival/event times from some distribution Ti∈1:niid∼f(t)
This already works fine, so why go Bayesian? Here are just a few (hopefully) compelling reasons:
- We may want to nonparametrically estimate the baseline hazard itself.
- Posterior inference is exact, so we don’t need to rely on asymptotic uncertainty estimates (though we may want to evaluate the frequentist properties of resulting point and interval estimates).
- Easy credible interval estimation for any function of the parameters. If we have posterior samples for the hazard, we also get automatic inference for the survival function as well.
Full Bayesian inference requires a proper probability model for both θ and λ0. This post walks through a Bayesian approach that places a nonparametric prior on λ0 – specifically the Gamma Process.
The Gamma Process Prior
Independent Hazards
Most of this comes from Kalbfleisch (1978), with an excellent technical outline by Ibrahim (2001).
Recall that the cumulative baseline hazard H0(t)=∫t0λ0(t)dt where the integral is the Riemann-Stieltjes integral. The central idea is to develop a prior for the cumulative hazard H0(t), which will then admit a prior for the hazard, λ0(t).
The Gamma Process is such a prior. Each realization of a Gamma Process is a cumulative hazard function that is centered around some prior cumulative hazard function, H∗, with a sort of dispersion/concentration parameter, β that controls how tightly the realizations are distributed around the prior H∗.
Okay, now the math. Let G(α,β) denote the Gamma distribution with shape parameter α and rate parameter β. Let H∗(t) for t≥0 be our prior cumulative hazard function. For example we could choose H∗ to be the exponential cumulative hazard, H∗(t)=η⋅t, where η is a fixed hyperparameter. By definition H∗(0)=0. The Gamma Process is defined as having the following properties:
- H0(0)=0
- λ0(t)=H0(t)–H0(s)∼G( β(H∗(t)–H∗(s)) , β ), for t>s
The increments in the cumulative hazard is the hazard function. The gamma process has the property that these increments are independent and Gamma-distributed. For a set of time increments t≥0, we can use the properties above to generate one realization of hazards {λ0(t)}t≥0. Equivaltently, one realization of the cumulative hazard function is {H0(t)}t≥0, where H0(t)=∑tk=0λ0(k). We denote the Gamma Process just described as H0(t)∼GP( βH∗(t), β), t≥0
Below in Panel A are some prior realizations of H0(t) with a Weibull H∗ prior for various concentration parameters, β. Notice for low β the realizations are widely dispersed around the mean cumulative hazard. Higher β yields to tighter dispersion around H∗.
Since there’s a correspondence between the H0(t), λ0(t), and S0(t), we could also plot prior realizations of the baseline survival function S0(t)=exp{−H0(t)} using the realization {H0(t)}t≥0. This is shown in Panel B with the Weibull survival function S∗ corresponding to H∗.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.