Modeling Frequency in Operational Losses with Python
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Poisson and Negative Binomial regressions are two popular approaches to model frequency measures in the operational loss and can be implemented in Python with the statsmodels package as below:
In [1]: import pandas as pd
In [2]: import statsmodels.api as sm
In [3]: import statsmodels.formula.api as smf
In [4]: df = pd.read_csv("AutoCollision.csv")
In [5]: # FITTING A POISSON REGRESSION
In [6]: poisson = smf.glm(formula = "Claim_Count ~ Age + Vehicle_Use", data = df, family = sm.families.Poisson(sm.families.links.log))
In [7]: poisson.fit().summary()
Out[7]:
<class 'statsmodels.iolib.summary.Summary'>
"""
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: Claim_Count No. Observations: 32
Model: GLM Df Residuals: 21
Model Family: Poisson Df Model: 10
Link Function: log Scale: 1.0
Method: IRLS Log-Likelihood: -204.40
Date: Tue, 08 Dec 2015 Deviance: 184.72
Time: 20:31:27 Pearson chi2: 184.
No. Iterations: 9
=============================================================================================
coef std err z P>|z| [95.0% Conf. Int.]
---------------------------------------------------------------------------------------------
Intercept 2.3702 0.110 21.588 0.000 2.155 2.585
Age[T.21-24] 1.4249 0.118 12.069 0.000 1.193 1.656
Age[T.25-29] 2.3465 0.111 21.148 0.000 2.129 2.564
Age[T.30-34] 2.5153 0.110 22.825 0.000 2.299 2.731
Age[T.35-39] 2.5821 0.110 23.488 0.000 2.367 2.798
Age[T.40-49] 3.2247 0.108 29.834 0.000 3.013 3.437
Age[T.50-59] 3.0019 0.109 27.641 0.000 2.789 3.215
Age[T.60+] 2.6391 0.110 24.053 0.000 2.424 2.854
Vehicle_Use[T.DriveLong] 0.9246 0.036 25.652 0.000 0.854 0.995
Vehicle_Use[T.DriveShort] 1.2856 0.034 37.307 0.000 1.218 1.353
Vehicle_Use[T.Pleasure] 0.1659 0.041 4.002 0.000 0.085 0.247
=============================================================================================
"""
In [8]: # FITTING A NEGATIVE BINOMIAL REGRESSION
In [9]: nbinom = smf.glm(formula = "Claim_Count ~ Age + Vehicle_Use", data = df, family = sm.families.NegativeBinomial(sm.families.links.log))
In [10]: nbinom.fit().summary()
Out[10]:
<class 'statsmodels.iolib.summary.Summary'>
"""
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: Claim_Count No. Observations: 32
Model: GLM Df Residuals: 21
Model Family: NegativeBinomial Df Model: 10
Link Function: log Scale: 0.0646089484752
Method: IRLS Log-Likelihood: -198.15
Date: Tue, 08 Dec 2015 Deviance: 1.4436
Time: 20:31:27 Pearson chi2: 1.36
No. Iterations: 11
=============================================================================================
coef std err z P>|z| [95.0% Conf. Int.]
---------------------------------------------------------------------------------------------
Intercept 2.2939 0.153 14.988 0.000 1.994 2.594
Age[T.21-24] 1.4546 0.183 7.950 0.000 1.096 1.813
Age[T.25-29] 2.4133 0.183 13.216 0.000 2.055 2.771
Age[T.30-34] 2.5636 0.183 14.042 0.000 2.206 2.921
Age[T.35-39] 2.6259 0.183 14.384 0.000 2.268 2.984
Age[T.40-49] 3.2408 0.182 17.760 0.000 2.883 3.598
Age[T.50-59] 2.9717 0.183 16.283 0.000 2.614 3.329
Age[T.60+] 2.6404 0.183 14.463 0.000 2.283 2.998
Vehicle_Use[T.DriveLong] 0.9480 0.128 7.408 0.000 0.697 1.199
Vehicle_Use[T.DriveShort] 1.3402 0.128 10.480 0.000 1.090 1.591
Vehicle_Use[T.Pleasure] 0.3265 0.128 2.548 0.011 0.075 0.578
=============================================================================================
"""
Although Quasi-Poisson regressions is not currently supported by the statsmodels package, we are still able to estimate the model with the rpy2 package by using R in the back-end. As shown in the output below, parameter estimates in Quasi-Poisson model are identical to the ones in standard Poisson model. In case that we want a flexible model approach for frequency measures in the operational loss forecast without pursuing more complex Negative Binomial model, Quasi-Poisson regression can be considered a serious contender.
In [11]: # FITTING A QUASI-POISSON REGRESSION
In [12]: import rpy2.robjects as ro
In [13]: from rpy2.robjects import pandas2ri
In [14]: pandas2ri.activate()
In [15]: rdf = pandas2ri.py2ri_pandasdataframe(df)
In [16]: qpoisson = ro.r.glm('Claim_Count ~ Age + Vehicle_Use', data = rdf, family = ro.r('quasipoisson(link = "log")'))
In [17]: print ro.r.summary(qpoisson)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3702 0.3252 7.288 3.55e-07 ***
Age21-24 1.4249 0.3497 4.074 0.000544 ***
Age25-29 2.3465 0.3287 7.140 4.85e-07 ***
Age30-34 2.5153 0.3264 7.705 1.49e-07 ***
Age35-39 2.5821 0.3256 7.929 9.49e-08 ***
Age40-49 3.2247 0.3202 10.072 1.71e-09 ***
Age50-59 3.0019 0.3217 9.331 6.42e-09 ***
Age60+ 2.6391 0.3250 8.120 6.48e-08 ***
Vehicle_UseDriveLong 0.9246 0.1068 8.660 2.26e-08 ***
Vehicle_UseDriveShort 1.2856 0.1021 12.595 2.97e-11 ***
Vehicle_UsePleasure 0.1659 0.1228 1.351 0.191016
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasipoisson family taken to be 8.774501)
Null deviance: 6064.97 on 31 degrees of freedom
Residual deviance: 184.72 on 21 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 4
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.