Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
The gamma distribution is a continuous probability distribution that is often used to model waiting times or other positively skewed data. It is a two-parameter distribution, where the shape parameter controls the skewness of the distribution and the scale parameter controls the spread of the distribution.
< section id="fitting-a-gamma-distribution-to-a-dataset-in-r" class="level1">Fitting a gamma distribution to a dataset in R
There are two main ways to fit a gamma distribution to a dataset in R:
- Maximum likelihood estimation (MLE): This method estimates the parameters of the gamma distribution that are most likely to have produced the observed data.
- Method of moments: This method estimates the parameters of the gamma distribution by equating the sample mean and variance to the theoretical mean and variance of the gamma distribution.
MLE is the more common and generally more reliable method of fitting a gamma distribution to a dataset. To fit a gamma distribution to a dataset using MLE, we can use the fitdist()
function from the fitdistrplus
package.
# Install the fitdistrplus package if necessary #install.packages("fitdistrplus") # Load the fitdistrplus package library(fitdistrplus) library(TidyDensity) set.seed(123) data <- tidy_gamma(.n = 500)$y # Fit a gamma distribution to the data fit <- fitdist(data, distr = "gamma", method = "mle")
The fit
object contains the estimated parameters of the gamma distribution, as well as other information about the fit. We can access the estimated parameters using the coef()
function. Now the tidy_gamma()
function from the TidyDensity package comes with a default setting of a .scale = 0.3
and shape = 1
. The rate is 1/.scale
, so by default it is 3.33333
# Get the estimated parameters of the gamma distribution coef(fit)
shape rate 1.031833 3.594773
Now let’s see how that compares to the built in TidyDensity function:
util_gamma_param_estimate(data)$parameter_tbl[1,c("shape","scale","shape_ratio")]
# A tibble: 1 × 3 shape scale shape_ratio <dbl> <dbl> <dbl> 1 0.983 0.292 3.36
In the above, the shape_ratio
is the rate
Try on your own!
I encourage you to try fitting a gamma distribution to your own data. You can use the fitdistrplus
package in R to fit a gamma distribution to any dataset. Once you have fitted a gamma distribution to your data, you can use the estimated parameters to generate random samples from the gamma distribution or to calculate the probability of observing a particular value.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.