Smoothing mortality rates
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This morning, I was working with Julie, a student of mine, coming from Rennes, on mortality tables. Actually, we work on genealogical datasets from a small region in Québec, and we can observe a lot of volatiliy. If I borrow one of her graph, we get something like
Since we have some missing data, we wanted to use some Generalized Nonlinear Models. So let us see how to get a smooth estimator of the mortality surface. We will write some code that we can use on our data later on (the dataset we have has been obtained after signing a lot of official documents, and I guess I cannot upload it here, even partially).
DEATH <- read.table( "http://freakonometrics.free.fr/Deces-France.txt", header=TRUE) EXPO <- read.table( "http://freakonometrics.free.fr/Exposures-France.txt", header=TRUE,skip=2) library(gnm) D=DEATH$Male E=EXPO$Male A=as.numeric(as.character(DEATH$Age)) Y=DEATH$Year I=(A<100) base=data.frame(D=D,E=E,Y=Y,A=A) subbase=base[I,] subbase=subbase[!is.na(subbase$A),]
The first idea can be to use a Poisson model, where the mortality rate is a smooth function of the age and the year, something like
that can be estimated using
library(mgcv) regbsp=gam(D~s(A,Y,bs="cr")+offset(log(E)),data=subbase,family=quasipoisson) predmodel=function(a,y) predict(regbsp,newdata=data.frame(A=a,Y=y,E=1)) vX=trunc(seq(0,99,length=41)) vY=trunc(seq(1900,2005,length=41)) vZ=outer(vX,vY,predmodel) persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)", ylab="Years (1900-2005)",zlab="Mortality rate (log)")
The mortality surface is here
It is also possible to extract the average value of the years, which is the interpretation of the coefficient in the Lee-Carter model,
predAx=function(a) mean(predict(regbsp,newdata=data.frame(A=a, Y=seq(min(subbase$Y),max(subbase$Y)),E=1))) plot(seq(0,99),Vectorize(predAx)(seq(0,99)),col="red",lwd=3,type="l")
We have the following smoothed mortality rate
Recall that the Lee-Carter model is
where parameter estimates can be obtained using
regnp=gnm(D~factor(A)+Mult(factor(A),factor(Y))+offset(log(E)), data=subbase,family=quasipoisson) predmodel=function(a,y) predict(regnp,newdata=data.frame(A=a,Y=y,E=1)) vZ=outer(vX,vY,predmodel) persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)", ylab="Years (1900-2005)",zlab="Mortality rate (log)")
The (crude) mortality surface is
with the following coefficients.
plot(seq(1,99),coefficients(regnp)[2:100],col="red",lwd=3,type="l")
Here we have a lot of coefficients, and unfortunately, on a smaller dataset, we have much more variability. Can we smooth our Lee-Carter model ? To get something which looks like
Actually, we can, and the code is rather simple
library(splines) knotsA=c(20,40,60,80) knotsY=c(1920,1945,1980,2000) regsp=gnm(D~bs(subbase$A,knots=knotsA,Boundary.knots=range(subbase$A),degre=3)+ Mult(bs(subbase$A,knots=knotsA,Boundary.knots=range(subbase$A),degre=3), bs(subbase$Y,knots=knotsY,Boundary.knots=range(subbase$Y),degre=3))+ offset(log(E)),data=subbase, family=quasipoisson) BpA=bs(seq(0,99),knots=knotsA,Boundary.knots=range(subbase$A),degre=3) BpY=bs(seq(min(subbase$Y),max(subbase$Y)),knots=knotsY,Boundary.knots= range(subbase$Y),degre=3) predmodel=function(a,y) predict(regsp,newdata=data.frame(A=a,Y=y,E=1)) v Z=outer(vX,vY,predmodel) persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)", ylab="Years (1900-2005)",zlab="Mortality rate (log)")
The mortality surface is now
and again, it is possible to extract the average mortality rate, as a function of the age, over the years,
BpA=bs(seq(0,99),knots=knotsA,Boundary.knots=range(subbase$A),degre=3) Ax=BpA%*%coefficients(regsp)[2:8] plot(seq(0,99),Ax,col="red",lwd=3,type="l")
We can then play with the smoothing parameters of the spline functions, and see the impact on the mortality surface
knotsA=seq(5,95,by=5) knotsY=seq(1910,2000,by=10) regsp=gnm(D~bs(A,knots=knotsA,Boundary.knots=range(subbase$A),degre=3)+ Mult(bs(A,knots=knotsA,Boundary.knots=range(subbase$A),degre=3), bs(Y,knots=knotsY,Boundary.knots=range(subbase$Y),degre=3)) +offset(log(E)),data=subbase,family=quasipoisson) predmodel=function(a,y) predict(regsp,newdata=data.frame(A=a,Y=y,E=1)) vZ=outer(vX,vY,predmodel) persp(vZ,theta=-30,col="green",shade=TRUE,xlab="Ages (0-100)", ylab="Years (1900-2005)",zlab="Mortality rate (log)")
We now have to use those functions our our small data sample ! That should be fun….
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.