The Beta distribution of the third kind (or generalised Beta prime)

[This article was first published on Saturn Elephant, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We present the family of so-called Beta distributions of the third kind. In the context of Bayesian statistics, it is a conjugate family of prior distributions on the odds parameter of the binomial model. This distribution is known, but nobody provided a way to sample from it. We show how one can sample from this distribution in R.

Preliminaries: the (scaled) Beta prime distribution

The Beta distribution of the third kind generalizes the Beta distribution of the second kind, also known under the name Beta prime distribution.

The Beta prime distribution \(\mathcal{B}'(c,d,\lambda)\) is the distribution of the random variable \(\lambda\frac{U}{1-U}\) where \(U \sim \mathcal{B}(c,d)\).

Its density function at \(x \geqslant 0\) is \[ \frac{1}{\lambda^c B(c,d)} \frac{x^{c-1}}{\left(1+\frac{x}{\lambda}\right)^{c+d}}. \] Usually the definition does not include the scale parameter \(\lambda\) (that is, it is usually defined for \(\lambda=1\) only).

It is easy to implement a sampler for this distribution, the density function and the cumulative density function:

rbetaprime <- function(n, c, d, lambda = 1){
  stopifnot(lambda > 0)
  u <- rbeta(n, c, d)
  lambda * u/(1-u)
}
dbetaprime <- function(x, c, d, lambda = 1){
  stopifnot(lambda > 0)
  lambda/(lambda+x)^2 * dbeta(x/lambda/(1+x/lambda), c, d)
}
pbetaprime <- function(q, c, d, lambda){
  stopifnot(lambda > 0)
  pbeta(q/lambda/(1+q/lambda), c, d)
}

Beta distribution of the third kind

The Beta distribution of the third kind \(\mathcal{B}_3\) was firstly introduced (as far as I know) in the paper Some Poisson mixtures with a hyperscale parameter, written by myself.

For parameters \(c>0\), \(d>0\), \(\kappa \in \mathbb{R}\), \(\tau>0\), the density function of \(\mathcal{B}_3(c,d,\kappa,\tau)\) is \[ f(\phi) = \frac{1} {B(c,d){}_2\!F_1\left(c, c+d-\kappa, c+d; 1 – \frac{1}{\tau}\right)} \frac{\phi^{c-1}(1+\phi)^{-\kappa}} {\left(1+\frac{\phi}{\tau}\right)^{c+d-\kappa}}, \quad \phi \geqslant 0. \] Thus, for \(\kappa=0\), the \(\mathcal{B}_3(c,d,\kappa,\tau)\) distribution equals \(\mathcal{B}'(c,d,\tau)\), and for \(\kappa = c+d\) or \(\tau=1\), it equals \(\mathcal{B}'(c,d,1)\). Note that in general, \(\tau\) is not a scale parameter.

Let’s write a R function computing this density:

library(gsl)
Gauss2F1 <- function(a,b,c,x){ 
  if(x>=0 && x<1){ # hyperg_2F1 works fine in this situation
    hyperg_2F1(a, b, c, x)
  }else{ # transform to come down to the first situation
    hyperg_2F1(c-a, b, c, 1-1/(1-x)) / (1-x)^b
  }
}
dB3 <- function(x, c, d, kappa, tau){
  stopifnot(c > 0, d > 0, tau > 0)
  if(kappa == 0){
    dbetaprime(x, c, d, tau)
  }else if(kappa == c+d){
    dbetaprime(x, c, d, 1)
  }else{
    1/beta(c,d) * x^(c-1)*(1+x)^(-kappa)/(1+x/tau)^(c+d-kappa) / 
      Gauss2F1(c, c+d-kappa, c+d, 1-1/tau)
  }
}

This distribution is related to the four-parameter generalized Beta distribution introduced by Chen & Novick in the paper Bayesian analysis for binomial models with generalized beta prior distributions (1984); this distribution takes its value in \([0,1]\). They are related by an elementary transformation: \[ \Theta \sim GB4(c, d, \kappa, \tau) \quad \iff\quad \frac{\Theta}{1-\Theta} \sim \mathcal{B}_3\left(c, d, c+d-\kappa, \frac{1}{\tau}\right). \]

Update 2019-09-05: generalised Beta distribution

I’ve just discovered that the \(GB4\) distribution appears in the paper On Kummer’s distributions of type two and generalized Beta distributions written by Hamza & Vallois. It is named generalised Beta distribution in this paper, it is denoted by \(\beta_\delta(a,b,c)\) and its density function at \(u \in [0,1]\) is given by \[ \frac{1}{\beta(a,b){}_2\!F_1(-c,a;a+b;1-\delta)} u^{a-1}(1-u)^{b-1}\bigl(1+(\delta-1)u\bigr)^c \] for \(a,b,\delta>0\) and \(c \in \mathbb{R}\).

We have the following relation: if \(\Phi \sim \mathcal{B}_3(c, d, \kappa, \tau)\), then \[ \frac{\Phi}{1+\Phi} \sim \beta_{\frac{1}{\tau}}(c, d, \kappa-c-d). \]

So, maybe a better name for \(\mathcal{B}_3\) would be generalised Beta prime distribution.

Cumulative distribution function

The cumulative distribution function of \(\mathcal{B}_3\) involves the Appell hypergeometric function \(F_1\). A Fortran implementation of this function is available in the R package appell. This package has been removed from CRAN, but you can still install it. If \(\Phi \sim \mathcal{B}_3(c,d,\kappa,\tau)\), then, for \(q \geqslant 0\), \[ \Pr(\Phi \leqslant q) = \frac{q^c F_1\left(c; \kappa, c+d-\kappa; c+1; -q, -\frac{q}{\tau}\right)} {cB(c,d){}_2\!F_1\left(c, c+d-\kappa, c+d; 1 – \frac{1}{\tau}\right)}. \] Here is a R implementation. I found that it works well with the option userflag = 0 of the appellf1 function.

pB3 <- function(q, c, d, kappa, tau, userflag = 0){
  stopifnot(c > 0, d > 0, tau > 0)
  if(kappa == 0){
    pbetaprime(q, c, d, tau)
  }else if(kappa == c+d){
    pbetaprime(q, c, d, 1)
  }else{
    C <- beta(c,d) * Gauss2F1(c, c+d-kappa, c+d, 1-1/tau)
    Appell <- 
      appell::appellf1(c, kappa, c+d-kappa, c+1, -q, -q/tau, 
                       userflag = userflag)
    q^c/c * Re(Appell$val) / C
  }
}

Sampling the Beta distribution of the third kind

It is not very easy to sample the \(\mathcal{B}_3\) distribution. In her master thesis, Myriam Chabot proved that it can be represented as a discrete mixture of \(\mathcal{B}_2\) distributions, and we will use this result.

This result is the following one.

  • For \(\tau < 1\), let \(K\) be a random variable on \(\mathbb{N}\) whose probability mass at \(k\in\mathbb{N}\) is given by \[ \frac{1}{{}_2\!F_1(d, c+d-\kappa, c+d, 1-\tau)} \frac{(1-\tau)^k}{k!} \frac{{(c+d-\kappa)}_k{(d)}_k}{{(c+d)}_k} \] and let \(\Phi\) be a random variable such that \[ (\Phi \mid K=k) \sim \mathcal{B}'(c, d+k, 1). \] Then \(\Phi \sim \mathcal{B}_3(c,d,\kappa,\tau)\).

  • For \(\tau > 1\), let \(K\) be a random variable on \(\mathbb{N}\) whose probability mass at \(k\in\mathbb{N}\) is given by \[ \frac{1}{{}_2\!F_1\left(c, c+d-\kappa, c+d, 1-\frac{1}{\tau}\right)} \frac{\left(1-\frac{1}{\tau}\right)^k}{k!} \frac{{(c+d-\kappa)}_k{(c)}_k}{{(c+d)}_k} \] and let \(\Phi\) be a random variable such that \[ (\Phi \mid K=k) \sim \mathcal{B}'(c+k, d, 1). \] Then \(\Phi \sim \mathcal{B}_3(c,d,\kappa,\tau)\).

So we can sample \(\mathcal{B}_3(c,d,\kappa,\tau)\) if we are able to sample these discrete distributions. To do so, we use the Runuran package.

library(Runuran)
pmf_unnormalized <- function(k, c, d, kappa, tau){
  out <- numeric(length(k))
  positive <- k >= 0
  k <- k[positive]
  out[positive] <- 
    if(tau < 1){
      exp(k*log(1-tau) - lfactorial(k) + 
            lnpoch(c+d-kappa,k) + lnpoch(d,k) - lnpoch(c+d,k)) 
    }else{
      exp(k*log(1-1/tau) - lfactorial(k) + 
            lnpoch(c+d-kappa,k) + lnpoch(c,k) - lnpoch(c+d,k))
    }
  out
}
NormalizingConstant <- function(c, d, kappa, tau){
  if(tau < 1){
    hyperg_2F1(d, c+d-kappa, c+d, 1-tau) 
  }else{
    hyperg_2F1(c, c+d-kappa, c+d, 1-1/tau) 
  }
}
Ksampler <- function(n, c, d, kappa, tau){
  dist <- unuran.discr.new(
    pmf = function(k) pmf_unnormalized(k, c, d, kappa, tau),
    lb = 0, ub= Inf, sum = NormalizingConstant(c, d, kappa, tau)
  )
  unuran <- unuran.new(dist, method="dgt") 
  ur(unuran, n)
}
rB3 <- function(n, c, d, kappa, tau){
  stopifnot(c > 0, d > 0, tau > 0)
  if(tau == 1 || kappa == c+d){
    rbetaprime(n, c, d, 1)
  }else if(kappa == 0){
    rbetaprime(n, c, d, tau)
  }else{
    K <- Ksampler(n, c, d, kappa, tau)
    if(tau < 1){
      rbetaprime(n, c, d + K, 1)
    }else{
      rbetaprime(n, c + K, d, 1)
    }
  }
}

Let’s check. The density:

c <- 2; d <- 3; kappa <- 4; tau <- 5
nsims <- 1000000
sims <- rB3(nsims, c, d, kappa, tau)
plot(density(sims, from = 0, to = quantile(sims, 0.95)))
curve(dB3(x, c, d, kappa, tau), add = TRUE, col = "red", 
      lty = "dashed", lwd = 2)

The cumulative distribution function:

q <- seq(0.1, 4, length.out = 10)[-6]
cdfValues <- sapply(q, function(x) pB3(x, c, d, kappa, tau))
empirical_cdf <- ecdf(sims)
plot(empirical_cdf, xlim = c(0,4))
points(q, cdfValues, pch=19)

I’ve removed the sixth value of the vector q because there is a crash of appellf1 for this value:

q <- seq(0.1, 4, length.out = 10)
pB3(q[6], c, d, kappa, tau)
## Error in appell::appellf1(c, kappa, c + d - kappa, c + 1, -q, -q/tau, : f1conv: Computation not possible

It works with the option userflag = 1:

pB3(q[6], c, d, kappa, tau, userflag = 1)
## [1] 0.849321

Finally perhaps the option userflag = 1 is a better default value:

cdfValues <- sapply(q, function(x){
  pB3(x, c, d, kappa, tau, userflag = 1)
})
plot(empirical_cdf, xlim = c(0,4))
points(q, cdfValues, pch=19)

Application to the Bayesian binomial model

Consider the binomial statistical model parameterized by the odds ratio \(\phi\): \[ L(\phi \mid x) \propto \frac{\phi^x}{(1+\phi)^n}. \] Take a \(\mathcal{B}_3(c,d,\kappa,\tau)\) prior distribution on \(\phi\): \[ \pi(\phi) \propto \frac{\phi^{c-1}(1+\phi)^{-\kappa}} {\left(1+\frac{\phi}{\tau}\right)^{c+d-\kappa}} \] Then the posterior distribution on \(\phi\) is \(\mathcal{B}_3(c+x,d+n-x,\kappa+n,\tau)\).

Application to the Bayesian “two Poisson samples” model

Consider the statistical model given by two independent observations \[ x \sim \mathcal{P}(\lambda S), \quad y\sim\mathcal{P}(\mu T) \] where \(S\) and \(T\) are known “sample sizes”. We parametrize the model by \(\mu\) and \(\phi := \frac{\lambda}{\mu}\).

Assigning a Gamma prior distribution \(\mathcal{G}{B}(a,b)\) on \(\mu\), it is not difficult to get \[ (\mu \mid \phi,x,y) \sim\mathcal{G}(a+x+y, b + \phi S + T). \] In their paper A Bayesian framework for the ratio of two Poisson rates in the context of vaccine efficacy trials (2011), Laurent & Legrand derived the marginal likelihood on \(\phi\) in the case of the semi-conjuguate family of prior distributions. Their result holds as long as \(\mu\) and \(\phi\) are independent under the prior distribution, and this result is \[ \widetilde{L}(\phi \mid x,y) \propto \frac{\phi^x}{{(1+\rho\phi)}^{a+x+y}} \] where \(\rho = \frac{S}{T+b}\).

Now, let’s assign a \(\mathcal{B}'(c,d,\tau)\) prior distribution on \(\phi\). Then the posterior distribution on \(\phi\) is given by \[ \pi(\phi \mid x,y) \propto \pi(\phi) \widetilde{L}(\phi \mid x,y) \propto \frac{\phi^{c+x-1}(1+\rho\phi)^{-(a+x+y)}} {\left(1+\frac{\phi}{\tau}\right)^{c+d}}, \] and by noting that \(\frac{\phi}{\tau} = \frac{\rho\phi}{\rho\tau}\), we recognize the scaled \(\mathcal{B}_3\) distribution \[ (\phi \mid x,y) \sim \frac{1}{\rho} \times \mathcal{B}_3(c+x, a+d+y, a+x+y, \rho\tau). \] In particular, if \(\tau = \frac{1}{\rho} = \frac{T+b}{S}\), then we find \[ (\phi \mid x,y) \sim \mathcal{B}_2(c+x, a+d+y, \tau), \] and this is the situation of the semi-conjugate family studied by Laurent & Legrand.

To leave a comment for the author, please follow the link and comment on their blog: Saturn Elephant.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)