Modeling Severity in Operational Losses with Python
[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When modeling severity measurements in the operational loss with Generalized Linear Models, we might have a couple choices based on different distributional assumptions, including Gamma, Inverse Gaussian, and Lognormal. However, based on my observations from the empirical work, the differences in parameter estimates among these three popular candidates are rather immaterial from the practical standpoint.
Below is a demonstration showing how to model the severity with the insurance data under aforementioned three distributions. As shown, albeit with inferential differences, three models show similar coefficients.
In [1]: # LOAD PACKAGES
In [2]: import pandas as pd
In [3]: import numpy as np
In [4]: import statsmodels.api as sm
In [5]: import statsmodels.formula.api as smf
In [6]: df = pd.read_csv("AutoCollision.csv")
In [7]: df.head()
Out[7]:
Age Vehicle_Use Severity Claim_Count
0 17-20 Pleasure 250.48 21
1 17-20 DriveShort 274.78 40
2 17-20 DriveLong 244.52 23
3 17-20 Business 797.80 5
4 21-24 Pleasure 213.71 63
In [8]: # FIT A GAMMA REGRESSION
In [9]: gamma = smf.glm(formula = "Severity ~ Age + Vehicle_Use", data = df, family = sm.families.Gamma(sm.families.links.log))
In [10]: gamma.fit().summary()
Out[10]:
<class 'statsmodels.iolib.summary.Summary'>
"""
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: Severity No. Observations: 32
Model: GLM Df Residuals: 21
Model Family: Gamma Df Model: 10
Link Function: log Scale: 0.0299607547345
Method: IRLS Log-Likelihood: -161.35
Date: Sun, 06 Dec 2015 Deviance: 0.58114
Time: 12:59:17 Pearson chi2: 0.629
No. Iterations: 8
=============================================================================================
coef std err z P>|z| [95.0% Conf. Int.]
---------------------------------------------------------------------------------------------
Intercept 6.2413 0.101 61.500 0.000 6.042 6.440
Age[T.21-24] -0.2080 0.122 -1.699 0.089 -0.448 0.032
Age[T.25-29] -0.2303 0.122 -1.881 0.060 -0.470 0.010
Age[T.30-34] -0.2630 0.122 -2.149 0.032 -0.503 -0.023
Age[T.35-39] -0.5311 0.122 -4.339 0.000 -0.771 -0.291
Age[T.40-49] -0.3820 0.122 -3.121 0.002 -0.622 -0.142
Age[T.50-59] -0.3741 0.122 -3.057 0.002 -0.614 -0.134
Age[T.60+] -0.3939 0.122 -3.218 0.001 -0.634 -0.154
Vehicle_Use[T.DriveLong] -0.3573 0.087 -4.128 0.000 -0.527 -0.188
Vehicle_Use[T.DriveShort] -0.5053 0.087 -5.839 0.000 -0.675 -0.336
Vehicle_Use[T.Pleasure] -0.5886 0.087 -6.801 0.000 -0.758 -0.419
=============================================================================================
"""
In [11]: # FIT A INVERSE GAUSSIAN REGRESSION
In [12]: igauss = smf.glm(formula = "Severity ~ Age + Vehicle_Use", data = df, family = sm.families.InverseGaussian(sm.families.links.log))
In [13]: igauss.fit().summary()
Out[13]:
<class 'statsmodels.iolib.summary.Summary'>
"""
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: Severity No. Observations: 32
Model: GLM Df Residuals: 21
Model Family: InverseGaussian Df Model: 10
Link Function: log Scale: 8.73581523073e-05
Method: IRLS Log-Likelihood: -156.44
Date: Sun, 06 Dec 2015 Deviance: 0.0015945
Time: 13:01:14 Pearson chi2: 0.00183
No. Iterations: 7
=============================================================================================
coef std err z P>|z| [95.0% Conf. Int.]
---------------------------------------------------------------------------------------------
Intercept 6.1776 0.103 59.957 0.000 5.976 6.379
Age[T.21-24] -0.1475 0.116 -1.269 0.204 -0.375 0.080
Age[T.25-29] -0.1632 0.116 -1.409 0.159 -0.390 0.064
Age[T.30-34] -0.2079 0.115 -1.814 0.070 -0.433 0.017
Age[T.35-39] -0.4732 0.108 -4.361 0.000 -0.686 -0.261
Age[T.40-49] -0.3299 0.112 -2.954 0.003 -0.549 -0.111
Age[T.50-59] -0.3206 0.112 -2.866 0.004 -0.540 -0.101
Age[T.60+] -0.3465 0.111 -3.115 0.002 -0.565 -0.128
Vehicle_Use[T.DriveLong] -0.3334 0.084 -3.992 0.000 -0.497 -0.170
Vehicle_Use[T.DriveShort] -0.4902 0.081 -6.055 0.000 -0.649 -0.332
Vehicle_Use[T.Pleasure] -0.5743 0.080 -7.206 0.000 -0.731 -0.418
=============================================================================================
"""
In [14]: # FIT A LOGNORMAL REGRESSION
In [15]: df['Log_Severity'] = np.log(df.Severity)
In [16]: lognormal = smf.glm(formula = "Log_Severity ~ Age + Vehicle_Use", data = df, family = sm.families.Gaussian())
In [17]: lognormal.fit().summary()
Out[17]:
<class 'statsmodels.iolib.summary.Summary'>
"""
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: Log_Severity No. Observations: 32
Model: GLM Df Residuals: 21
Model Family: Gaussian Df Model: 10
Link Function: identity Scale: 0.0265610360381
Method: IRLS Log-Likelihood: 19.386
Date: Sun, 06 Dec 2015 Deviance: 0.55778
Time: 13:02:12 Pearson chi2: 0.558
No. Iterations: 4
=============================================================================================
coef std err z P>|z| [95.0% Conf. Int.]
---------------------------------------------------------------------------------------------
Intercept 6.1829 0.096 64.706 0.000 5.996 6.370
Age[T.21-24] -0.1667 0.115 -1.447 0.148 -0.393 0.059
Age[T.25-29] -0.1872 0.115 -1.624 0.104 -0.413 0.039
Age[T.30-34] -0.2163 0.115 -1.877 0.061 -0.442 0.010
Age[T.35-39] -0.4901 0.115 -4.252 0.000 -0.716 -0.264
Age[T.40-49] -0.3347 0.115 -2.904 0.004 -0.561 -0.109
Age[T.50-59] -0.3267 0.115 -2.835 0.005 -0.553 -0.101
Age[T.60+] -0.3467 0.115 -3.009 0.003 -0.573 -0.121
Vehicle_Use[T.DriveLong] -0.3481 0.081 -4.272 0.000 -0.508 -0.188
Vehicle_Use[T.DriveShort] -0.4903 0.081 -6.016 0.000 -0.650 -0.331
Vehicle_Use[T.Pleasure] -0.5726 0.081 -7.027 0.000 -0.732 -0.413
=============================================================================================
"""
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.