Site icon R-bloggers

Fitting a Distribution to Data in R

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< section id="introduction" class="level1">

Introduction

The gamma distribution is a continuous probability distribution that is often used to model waiting times or other positively skewed data. It is a two-parameter distribution, where the shape parameter controls the skewness of the distribution and the scale parameter controls the spread of the distribution.

< section id="fitting-a-gamma-distribution-to-a-dataset-in-r" class="level1">

Fitting a gamma distribution to a dataset in R

There are two main ways to fit a gamma distribution to a dataset in R:

  1. Maximum likelihood estimation (MLE): This method estimates the parameters of the gamma distribution that are most likely to have produced the observed data.
  2. Method of moments: This method estimates the parameters of the gamma distribution by equating the sample mean and variance to the theoretical mean and variance of the gamma distribution.

MLE is the more common and generally more reliable method of fitting a gamma distribution to a dataset. To fit a gamma distribution to a dataset using MLE, we can use the fitdist() function from the fitdistrplus package.

# Install the fitdistrplus package if necessary
#install.packages("fitdistrplus")

# Load the fitdistrplus package
library(fitdistrplus)
library(TidyDensity)

set.seed(123)
data <- tidy_gamma(.n = 500)$y

# Fit a gamma distribution to the data
fit <- fitdist(data, distr = "gamma", method = "mle")

The fit object contains the estimated parameters of the gamma distribution, as well as other information about the fit. We can access the estimated parameters using the coef() function. Now the tidy_gamma() function from the TidyDensity package comes with a default setting of a .scale = 0.3 and shape = 1. The rate is 1/.scale, so by default it is 3.33333

# Get the estimated parameters of the gamma distribution
coef(fit)
   shape     rate 
1.031833 3.594773 

Now let’s see how that compares to the built in TidyDensity function:

util_gamma_param_estimate(data)$parameter_tbl[1,c("shape","scale","shape_ratio")]
# A tibble: 1 × 3
  shape scale shape_ratio
  <dbl> <dbl>       <dbl>
1 0.983 0.292        3.36

In the above, the shape_ratio is the rate

< section id="try-on-your-own" class="level1">

Try on your own!

I encourage you to try fitting a gamma distribution to your own data. You can use the fitdistrplus package in R to fit a gamma distribution to any dataset. Once you have fitted a gamma distribution to your data, you can use the estimated parameters to generate random samples from the gamma distribution or to calculate the probability of observing a particular value.

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version