[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When modeling severity measurements in the operational loss with Generalized Linear Models, we might have a couple choices based on different distributional assumptions, including Gamma, Inverse Gaussian, and Lognormal. However, based on my observations from the empirical work, the differences in parameter estimates among these three popular candidates are rather immaterial from the practical standpoint.
Below is a demonstration showing how to model the severity with the insurance data under aforementioned three distributions. As shown, albeit with inferential differences, three models show similar coefficients.
In [1]: # LOAD PACKAGES In [2]: import pandas as pd In [3]: import numpy as np In [4]: import statsmodels.api as sm In [5]: import statsmodels.formula.api as smf In [6]: df = pd.read_csv("AutoCollision.csv") In [7]: df.head() Out[7]: Age Vehicle_Use Severity Claim_Count 0 17-20 Pleasure 250.48 21 1 17-20 DriveShort 274.78 40 2 17-20 DriveLong 244.52 23 3 17-20 Business 797.80 5 4 21-24 Pleasure 213.71 63 In [8]: # FIT A GAMMA REGRESSION In [9]: gamma = smf.glm(formula = "Severity ~ Age + Vehicle_Use", data = df, family = sm.families.Gamma(sm.families.links.log)) In [10]: gamma.fit().summary() Out[10]: <class 'statsmodels.iolib.summary.Summary'> """ Generalized Linear Model Regression Results ============================================================================== Dep. Variable: Severity No. Observations: 32 Model: GLM Df Residuals: 21 Model Family: Gamma Df Model: 10 Link Function: log Scale: 0.0299607547345 Method: IRLS Log-Likelihood: -161.35 Date: Sun, 06 Dec 2015 Deviance: 0.58114 Time: 12:59:17 Pearson chi2: 0.629 No. Iterations: 8 ============================================================================================= coef std err z P>|z| [95.0% Conf. Int.] --------------------------------------------------------------------------------------------- Intercept 6.2413 0.101 61.500 0.000 6.042 6.440 Age[T.21-24] -0.2080 0.122 -1.699 0.089 -0.448 0.032 Age[T.25-29] -0.2303 0.122 -1.881 0.060 -0.470 0.010 Age[T.30-34] -0.2630 0.122 -2.149 0.032 -0.503 -0.023 Age[T.35-39] -0.5311 0.122 -4.339 0.000 -0.771 -0.291 Age[T.40-49] -0.3820 0.122 -3.121 0.002 -0.622 -0.142 Age[T.50-59] -0.3741 0.122 -3.057 0.002 -0.614 -0.134 Age[T.60+] -0.3939 0.122 -3.218 0.001 -0.634 -0.154 Vehicle_Use[T.DriveLong] -0.3573 0.087 -4.128 0.000 -0.527 -0.188 Vehicle_Use[T.DriveShort] -0.5053 0.087 -5.839 0.000 -0.675 -0.336 Vehicle_Use[T.Pleasure] -0.5886 0.087 -6.801 0.000 -0.758 -0.419 ============================================================================================= """ In [11]: # FIT A INVERSE GAUSSIAN REGRESSION In [12]: igauss = smf.glm(formula = "Severity ~ Age + Vehicle_Use", data = df, family = sm.families.InverseGaussian(sm.families.links.log)) In [13]: igauss.fit().summary() Out[13]: <class 'statsmodels.iolib.summary.Summary'> """ Generalized Linear Model Regression Results ============================================================================== Dep. Variable: Severity No. Observations: 32 Model: GLM Df Residuals: 21 Model Family: InverseGaussian Df Model: 10 Link Function: log Scale: 8.73581523073e-05 Method: IRLS Log-Likelihood: -156.44 Date: Sun, 06 Dec 2015 Deviance: 0.0015945 Time: 13:01:14 Pearson chi2: 0.00183 No. Iterations: 7 ============================================================================================= coef std err z P>|z| [95.0% Conf. Int.] --------------------------------------------------------------------------------------------- Intercept 6.1776 0.103 59.957 0.000 5.976 6.379 Age[T.21-24] -0.1475 0.116 -1.269 0.204 -0.375 0.080 Age[T.25-29] -0.1632 0.116 -1.409 0.159 -0.390 0.064 Age[T.30-34] -0.2079 0.115 -1.814 0.070 -0.433 0.017 Age[T.35-39] -0.4732 0.108 -4.361 0.000 -0.686 -0.261 Age[T.40-49] -0.3299 0.112 -2.954 0.003 -0.549 -0.111 Age[T.50-59] -0.3206 0.112 -2.866 0.004 -0.540 -0.101 Age[T.60+] -0.3465 0.111 -3.115 0.002 -0.565 -0.128 Vehicle_Use[T.DriveLong] -0.3334 0.084 -3.992 0.000 -0.497 -0.170 Vehicle_Use[T.DriveShort] -0.4902 0.081 -6.055 0.000 -0.649 -0.332 Vehicle_Use[T.Pleasure] -0.5743 0.080 -7.206 0.000 -0.731 -0.418 ============================================================================================= """ In [14]: # FIT A LOGNORMAL REGRESSION In [15]: df['Log_Severity'] = np.log(df.Severity) In [16]: lognormal = smf.glm(formula = "Log_Severity ~ Age + Vehicle_Use", data = df, family = sm.families.Gaussian()) In [17]: lognormal.fit().summary() Out[17]: <class 'statsmodels.iolib.summary.Summary'> """ Generalized Linear Model Regression Results ============================================================================== Dep. Variable: Log_Severity No. Observations: 32 Model: GLM Df Residuals: 21 Model Family: Gaussian Df Model: 10 Link Function: identity Scale: 0.0265610360381 Method: IRLS Log-Likelihood: 19.386 Date: Sun, 06 Dec 2015 Deviance: 0.55778 Time: 13:02:12 Pearson chi2: 0.558 No. Iterations: 4 ============================================================================================= coef std err z P>|z| [95.0% Conf. Int.] --------------------------------------------------------------------------------------------- Intercept 6.1829 0.096 64.706 0.000 5.996 6.370 Age[T.21-24] -0.1667 0.115 -1.447 0.148 -0.393 0.059 Age[T.25-29] -0.1872 0.115 -1.624 0.104 -0.413 0.039 Age[T.30-34] -0.2163 0.115 -1.877 0.061 -0.442 0.010 Age[T.35-39] -0.4901 0.115 -4.252 0.000 -0.716 -0.264 Age[T.40-49] -0.3347 0.115 -2.904 0.004 -0.561 -0.109 Age[T.50-59] -0.3267 0.115 -2.835 0.005 -0.553 -0.101 Age[T.60+] -0.3467 0.115 -3.009 0.003 -0.573 -0.121 Vehicle_Use[T.DriveLong] -0.3481 0.081 -4.272 0.000 -0.508 -0.188 Vehicle_Use[T.DriveShort] -0.4903 0.081 -6.016 0.000 -0.650 -0.331 Vehicle_Use[T.Pleasure] -0.5726 0.081 -7.027 0.000 -0.732 -0.413 ============================================================================================= """
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.