Parameters and percentiles (the gamma distribution)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
by Andrie de Vries
In one of John D. Cooke‘s blog posts of 2010 (Parameters and Percentiles), he poses the following problem:
The doctor says 10% of patients respond within 30 days of treatment and 80% respond within 90 days of treatment. Now go turn that into a probability distribution. That’s a common task in Bayesian statistics, capturing expert opinion in a mathematical form to create a prior distribution.
John then discusses how this level of information is highly valuable in statistical inference. The reason is that quite often this is the kind of information you might be able to elicit from a domain expert. It then is up to you as the statistician / data scientist to use this information. In bayesian statistics, for example, you can use this information to construct a bayesian prior distribution. In particular, he demonstrates how this expectation can be modeled with a gamma distribution and shows how to solve the problem analytically.
In this post I demonstrate how to solve the problem using the non-linear least squares solver in R, using the nls() function.
But first, take a look at some of the properties of the gamma distribution. The gamma is a general family of distributions. Both the exponential and the chi-squared distributions are special cases of the gamma.
The gamma distribution takes two arguments. The first defines the shape. If shape is close to zero, the gamma is very similar to the exponential. If shape is large, then the gamma is similar to the chi-squared distribution.
To create the plots, you can use the function curve() to do the actual plotting, and dgamma() to compute the gamma density distribution. In this grid of plots, the shape parameter varies horisontally (from 1 on the left to 6 on the right). At the same time, the scale parameter varies vertically (from 0.1 at the top to 1.0 at the bottom).
Next, you can use the function nls() to solve the problem as posed by John Cooke. The nls() function takes a loss function as an argument. This loss function is the function to be minimised by the solver. In the posed problem, you can compute the loss function as the difference between a hypothetical gamma distribution, calculated by qgamma() and the expected values posed by the problem.
The nls() solver is sensitive to the starting conditions, but easily finds a solution:
To replicate this example, you can use this code:
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.