Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In order to illustrate the next section of the non-life insurance course, consider the following example1, inspired from http://sciencepolicy.colorado.edu/…. This is the so-called “Normalized Hurricane Damages in the United States” dataset, for the period 1900-2005, from Pielke et al. (2008). The dataset is available in xls format, so we have to spend some time to import it,
> library(gdata) > db=read.xls( + "http://sciencepolicy.colorado.edu/publications/special/public_data_may_2007.xls", + sheet=1) trying URL 'http://sciencepolicy.colorado.edu/publications/special/public_data_may_2007.xls' Content type 'application/vnd.ms-excel' length 119296 bytes (116 Kb) opened URL ================================================== downloaded 116 Kb perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = "fr_CA:fr", LC_ALL = (unset), LANG = "fr_CA.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C").
The problem with excel spreadsheets is that some columns might have pre-specified format (here, losses are with a format 000,000,000 for instance)
> tail(db) Year Hurricane.Description State Category Base.Economic.Damage 202 2005 Cindy LA 1 320,000,000 203 2005 Dennis FL 3 2,230,000,000 204 2005 Katrina LA,MS 3 81,000,000,000 205 2005 Ophelia NC 1 1,600,000,000 206 2005 Rita TX 3 10,000,000,000 207 2005 Wilma FL 3 20,600,000,000 Normalized.PL05 Normalized.CL05 X X.1 202 320,000,000 320,000,000 NA NA 203 2,230,000,000 2,230,000,000 NA NA 204 81,000,000,000 81,000,000,000 NA NA 205 1,600,000,000 1,600,000,000 NA NA 206 10,000,000,000 10,000,000,000 NA NA 207 20,600,000,000 20,600,000,000 NA NA
To get data in a format we can play with, consider the following function,
> stupidcomma = function(x){ + x=as.character(x) + for(i in 1:10){x=sub(",","",as.character(x))} + return(as.numeric(x))}
and let’s convert those values into numbers,
> base=db[,1:4] > base$Base.Economic.Damage=Vectorize(stupidcomma)(db$Base.Economic.Damage) > base$Normalized.PL05=Vectorize(stupidcomma)(db$Normalized.PL05) > base$Normalized.CL05=Vectorize(stupidcomma)(db$Normalized.CL05)
Here is the dataset we will use, from now on,
> tail(base) Year Hurricane.Description State Category Base.Economic.Damage 202 2005 Cindy LA 1 3.20e+08 203 2005 Dennis FL 3 2.23e+09 204 2005 Katrina LA,MS 3 8.10e+10 205 2005 Ophelia NC 1 1.60e+09 206 2005 Rita TX 3 1.00e+10 207 2005 Wilma FL 3 2.06e+10 Normalized.PL05 Normalized.CL05 202 3.20e+08 3.20e+08 203 2.23e+09 2.23e+09 204 8.10e+10 8.10e+10 205 1.60e+09 1.60e+09 206 1.00e+10 1.00e+10 207 2.06e+10 2.06e+10
We can visualize the normalized costs of hurricanes, from 1900 till 2005, with the 207 hurricanes (here the x-axis is not time, it is simply the index of the loss)
> plot(base$Normalized.PL05/1e9,type="h",ylim=c(0,155))
As usual, there are two components when computing the pure premium of an insurance contract. The number of claims (or here hurricanes) and the individual losses of each claim. We’ve seen – above – individual losses, let us focus now on the annual frequency.
> TB <- table(base$Year) > years <- as.numeric(names(TB)) > counts <- as.numeric(TB) > years0=(1900:2005)[which(!(1900:2005)%in%years)] > db <- data.frame(years=c(years,years0), + counts=c(counts,rep(0,length(years0)))) > db[88:93,] years counts 88 2003 3 89 2004 6 90 2005 6 91 1902 0 92 1905 0 93 1907 0
On average, we experience about 2 (major) hurricanes per year,
> mean(db$counts) [1] 1.95283
In predictive modeling (here, we wish to price a reinsurance contract for, say, 2014), we need probably to take into account some possible trend in the hurricane occurrence frequency. We can consider either a linear trend,
> reg0 <- glm(counts~years,data=db,family=poisson(link="identity"), + start=lm(counts~years,data=db)$coefficients)
or an exponential one,
> reg1 <- glm(counts~years,data=db,family=poisson(link="log"))
We can plot those three predictions, and get a prediction for the number of (major) hurricanes in 2014,
> plot(years,counts,type='h',ylim=c(0,6),xlim=c(1900,2020)) > cpred1=predict(reg1,newdata=data.frame(years=1890:2030),type="response") > lines(1890:2030,cpred1,col="blue") > cpred0=predict(reg0,newdata=data.frame(years=1890:2030),type="response") > lines(1890:2030,cpred0,col="red") > abline(h=mean(db$counts),col="black") > (predictions=cbind(constant=mean(db$counts),linear= + cpred0[126],exponential=cpred1[126])) constant linear exponential 126 1.95283 3.573999 4.379822 > points(rep((1890:2030)[126],3),prediction,col=c("black","red","blue"),pch=19)
Observe that changing the model will change the pure premium: with a flat prediction, we expect less than 2 (major) hurricanes, but with the exponential trend, we expect more than 4…
This is for the expected frequency. Now, we should find a suitable model to compute the pure premium of a reinsurance treaty, with a (high) deductible, and a limited (but large) cover. As we will seen in class next week, the appropriate model is a Pareto distribution (see Hagstrœm (1925), Huyghues-Beaufond (1991) or a survey – in French – published a few years ago).
We can use Hill’s plot to estimate the tail index,
> library(evir) > hill(base$Normalized.PL05)
Clearly, costs of major hurricanes are heavy tailed.
Now, consider an insurance company, in the U.S., with 5% market share (just to illustrate). We will consider there \tilde Y_i= Y_i/20. The losses are given below. Consider a reinsurance treaty, with a deductible of 2 (billion) and a limited cover of 4 (billion),
For our Pareto model, consider only losses above 500 millions,
> threshold=.5 > (gpd.PL <- gpd(base$Normalized.PL05/1e9/20,threshold)$par.ests) xi beta 0.4424669 0.6705315
Keep in mind the 1 hurricane out of 8 reaches that level
> mean(base$Normalized.CL05/1e9/20>.5) [1] 0.1256039
Given that the loss exceeds 500 millions, we can now compute the expected value of the reinsurance contact,
To compute it we can use
> E <- function(yinf,ysup,xi,beta){ + as.numeric(integrate(function(x) (x-yinf)*dgpd(x,xi,mu=threshold,beta), + lower=yinf,upper=ysup)$value+ + (1-pgpd(ysup,xi,mu=threshold,beta))*(ysup-yinf)) + }
Now, it is probably time to bring all the pieces together. We might expect a bit less than 2 (major) hurricanes per year,
> predictions[1] [1] 1.95283
and each hurricane has 12.5% chances to cost more than 500 million for our insurance company,
> mean(base$Normalized.PL05/1e9/20>.5) [1] 0.1256039
and given that an hurricane exceeds 500 million loss, then the expected repayment by the reinsurance company is (in millions)
> E(2,6,gpd.PL[1],gpd.PL[2])*1e3 [1] 330.9865
So the pure premium of the reinsurance contract is simply
> predictions[1]*mean(base$Normalized.PL05/1e9/20>.5)* + E(2,6,gpd.PL[1],gpd.PL[2])*1e3 [1] 81.18538
for a cover of 4 billion, in excess of 2.
1.This example will be found in the Reinsurance and Extremal Events chapter in the forthcoming Computational Actuarial Science with R, by Eric Gilleland and Mathieu Ribatet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.