Site icon R-bloggers

Copulas and tail dependence, part 1

[This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As mentioned in the course last week Venter (2003) suggested nice functions to illustrate tail dependence (see also some slides used in Berlin a few years ago). Joe (1990) suggested a (strong) tail dependence index. For lower tails, for instance, consider
i.e

The idea is to plot the function above, in order to visualize limiting behavior. Define

for the lower tail, and
for the upper tail, where is the survival copula associated with , in the sense that
while

Now, one can easily derive empirical conterparts of those function, i.e.

and
Thus, for upper tail, on the right, we have the following graph

and for the lower tail, on the left, we have

For the code, consider some real data, like the loss-ALAE dataset.

> library(evd)
> X=lossalae

The idea is to plot, on the left, the lower tail concentration function, and on the right, the upper tail function.

> U=rank(X[,1])/(nrow(X)+1)
> V=rank(X[,2])/(nrow(X)+1)
> Lemp=function(z) sum((U<=z)&(V<=z))/sum(U<=z)
> Remp=function(z) sum((U>=1-z)&(V>=1-z))/sum(U>=1-z)
> u=seq(.001,.5,by=.001)
> L=Vectorize(Lemp)(u)
> R=Vectorize(Remp)(rev(u))
> plot(c(u,u+.5-u[1]),c(L,R),type="l",ylim=0:1,
+ xlab="LOWER TAIL          UPPER TAIL")
> abline(v=.5,col="grey")

Now, we can compare this graph, with what should be obtained for some parametric copulas that have the same Kendall’s tau (e.g.). For instance, if we consider a Gaussian copula,

> tau=cor(lossalae,method="kendall")[1,2]
> library(copula)
> paramgauss=sin(tau*pi/2)
> copgauss=normalCopula(paramgauss)
> Lgaussian=function(z) pCopula(c(z,z),copgauss)/z
> Rgaussian=function(z) (1-2*z+pCopula(c(z,z),copgauss))/(1-z)
> u=seq(.001,.5,by=.001)
> Lgs=Vectorize(Lgaussian)(u)
> Rgs=Vectorize(Rgaussian)(1-rev(u))
> lines(c(u,u+.5-u[1]),c(Lgs,Rgs),col="red")

or Gumbel’s copula,

> paramgumbel=1/(1-tau)
> copgumbel=gumbelCopula(paramgumbel, dim = 2)
> Lgumbel=function(z) pCopula(c(z,z),copgumbel)/z
> Rgumbel=function(z) (1-2*z+pCopula(c(z,z),copgumbel))/(1-z)
> u=seq(.001,.5,by=.001)
> Lgl=Vectorize(Lgumbel)(u)
> Rgl=Vectorize(Rgumbel)(1-rev(u))
> lines(c(u,u+.5-u[1]),c(Lgl,Rgl),col="blue")

That’s nice (isn’t it?), but since we do not have any confidence interval, it is still hard to conclude (even if it looks like Gumbel copula has a much better fit than the Gaussian one). A strategy can be to generate samples from those copulas, and to visualize what we had. With a Gaussian copula, the graph looks like
> u=seq(.0025,.5,by=.0025); nu=length(u)
> nsimul=500
> MGS=matrix(NA,nsimul,2*nu)
> for(s in 1:nsimul){
+ Xs=rCopula(nrow(X),copgauss)
+ Us=rank(Xs[,1])/(nrow(Xs)+1)
+ Vs=rank(Xs[,2])/(nrow(Xs)+1)
+ Lemp=function(z) sum((Us<=z)&(Vs<=z))/sum(Us<=z)
+ Remp=function(z) sum((Us>=1-z)&(Vs>=1-z))/sum(Us>=1-z)
+ MGS[s,1:nu]=Vectorize(Lemp)(u)
+ MGS[s,(nu+1):(2*nu)]=Vectorize(Remp)(rev(u))
+ lines(c(u,u+.5-u[1]),MGS[s,],col="red")
+ }

(including – pointwise – 90% confidence bands)

> Q95=function(x) quantile(x,.95)
> V95=apply(MGS,2,Q95)
> lines(c(u,u+.5-u[1]),V95,col="red",lwd=2)
> Q05=function(x) quantile(x,.05)
> V05=apply(MGS,2,Q05)
> lines(c(u,u+.5-u[1]),V05,col="red",lwd=2)

while it is

with Gumbel copula. Isn’t it a nice (graphical) tool ?

But as mentioned in the course, the statistical convergence can be slow. Extremely slow. So assessing if the underlying copula has tail dependence, or not, it now that simple. Especially if the copula exhibits tail independence. Like the Gaussian copula. Consider a sample of size 1,000. This is what we obtain if we generate random scenarios,

or we look at the left tail (with a log-scale)

Now, consider a 10,000 sample,

or with a log-scale

We can even consider a 100,000 sample,

or with a log-scale

On those graphs, it is rather difficult to conclude if the limit is 0, or some strictly positive value (again, it is a classical statistical problem when the value of interest is at the border of the support of the parameter). So, a natural idea is to consider a weaker tail dependence index. Unless you have something like 100,000 observations…

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.