Bayesian nonparametric modeling of conditional multidimensional dependence structures

[This article was first published on YoungStatS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Overview

In many real data applications we are often required to model jointly d3 continuous random variables, denoted as Y1,,Yd . The multivariate distribution, which allows us to describe the joint behaviour of those variables, can be denoted as F(Y1,,Yd)=P(Y1y1,,Yd,yd) . However, complex relations between data, particularly asymmetric and tail dependent associations, are often difficult to be modelled. The copula approach allows us to express the multivariate distribution of a set of variables by separating the marginals from the dependence structure. Furthermore, the idea of modelling the effect of covariates on the dependence structure described by copulas has recently attracted increasing attention.

Proposal

In Barone and Dalla Valle (2023) we provide a flexible Bayesian mixture model that returns easy-to-interpret results, estimating the effects of covariates on high-dimensional dependence structures and showing good performances in both clustering with unknown number of components and density estimation.

Dirichlet process mixture of conditional vines

Let us consider Y1,,Yd , which are continuous random variables of interest and let X=(X1,,Xp) be a vector of covariates that may affect the dependence between Y1,,Yd. Then, the conditional joint distribution function of (Y1,,Yd) given X=x is

Fx(y1,,yd)=P(Y1y1,,Ydyd|X=x),

under the assumption that such conditional distribution exists (see Gijbels et al. (2012), Abegaz, Gijbels, and Veraverbeke (2012) and Acar, Craiu, and Yao (2011) ). We denote the conditional marginals of Fx as

F1,x(y1)=P(Y1y1|X=x),Fd,x(yd)=P(Ydyd|X=x).

If the marginals are continuous, then Sklar’s theorem (Sklar 1959) allows us to write

Cx(u1,,ud)=Fx(F11,x(u1),,F1d,x(ud))

where F1j,x(uj)=inf{yj:Fj,xuj}, for j=1,,d , are the conditional quantile functions and uj=Fj,x(yj). The conditional copula Cx fully describes the conditional dependence structure of (Y1,,Yd) given X=x. Therefore, the conditional joint distribution function can be written as

Fx(Y1,,Yd)=Cx(F1,x(y1),,Fd,x(yd)).

Let us denote the copula density corresponding to the distribution Cx(F1x(y1),,Fdx(yd)) as

cx(u1,,ud)=cθ(u1,,ud|x)=cθ(x)(u1,,ud),

where θ is the parameter vector of the d-variate copula density. We assume that the function θ(x) depends on a vector of parameters β such that cθ(x)(u1,,ud)=cθ(x|β)(u1,,ud)=c1:d(u1,,ud|θ(x|β)).

The (1) can be written in terms of vines Czado (2019), where each pair-copula depends on the vector of covariates X.

Trivariate vine representation.

The vine representation can be generalized to special vine distribution classes, the most popular of which are D-vines (see Bedford and Cooke (2001), Aas et al. (2009) and Czado (2019)).The conditional D-vine decomposition takes the form

c1:d(u1,,ud|θ(x|β))=d1=1dk=1ck,+k;k+1,,k+1{Fk|k+1,,k+1,x(yk|yk+1,,k+1),F+k|k+1,,k+1,x(y+k|yk+1,,k+1)|θk,+k;k+1,,k+1(x|β)}.

In Barone and Dalla Valle (2023), we model multivariate dependence structures specified as the product of ν=d(d1)/2 pair copulas, indexed by the ν×(q+1)-dimensional vector of parameters β. The covariates fh(xh), h=1,,p, are independent random variables with parameters ϕ=(ϕ1,,ϕp). Note that qp and its value depends on the chosen link function; for example if the link is linear q=p. Let the vector of parameters ξ=(β,ϕ) be defined on the parameter space Ξ. We rewrite the density fG(x)cG(,,|x) as an infinite mixture of conditional vine copulas with kernel fξ(x)cξ(,,|x) with respect to the mixing measure G, that isfG(x)cG(u1,,ud|x)=fϕj(x)c1:d(u1,,ud|θ(x|βj))dG(ξ).

With a Dirichlet Process (DP) prior on G, we get a Dirichlet Process Mixture (DPM) of conditional vine copulas, which may be alternatively represented as

fϕ(x)cθ(x|β)(u1,,ud|x)=j=1ωjfϕj(x)c1:d(u1,,ud|θ(x|βj)),

where the weights ωj sum to 1. The posterior distribution Π(G|Y,X) is a mixture of DP models, mixing with respect to the latent variables ξi specific to each observation i (F1(y1i),,Fd(ydi)) 

for i=1,,N:

G|Y,XDP(MG0+Ni=1δϕiβi)dΠ(ϕ,β|y,x),

where M is the concentration parameter, G0 is the centring measure and δt denotes the Dirac measure at t.

Posterior inference is performed via MCMC sampling by using a Pólya-urn scheme for integrating out of the model the random distribution function from the Dirichlet process MacEachern and Müller (1998).

Financial development and natural disasters data

We present an application to a heterogeneous dataset to study the impacts of worldwide natural disasters on international financial development. We define a 4-dimensional vine copula with marginals denoting the FD index in 4 consecutive years and consider the occurrence of a natural disaster as a binary covariate taking value 1 if the total damage is over 100 million dollars and 0 otherwise. The pair copulas parameters are associated to the covariates through a link function g, such that

ρ(x|β)=g1(η(x|β))

where g1 is the Fisher’s transform and η() is the calibration function η=β0+β1X. We set M=1 and G0 as a flat multivariate Gaussian distribution centred on a vector of zeros.

The top left panel shows the barplot of the number of observations allocated to the two estimated mixture components. The top right panel compares the posterior densities of the calibration function parameter β112 (which is related to the first time interval (12)) for the first (solid line) and the second (dashed line) mixture components. The left and right bottom panels show, for the first and second mixture components, the boxplots of the calibration function parameters β1 for the first, second and third time intervals (12; 23; 34).

The two estimated mixture components present substantial differences in terms of how they are impacted by natural disasters. For the first cluster (ψ=1) the model estimates a general negative effect which tends to remain constant until the fourth year;  instead, for the second cluster (ψ=2) the model estimates a positive effect of the natural disaster on the time dependence between yearly FD indexes. 

Take home message

In Barone and Dalla Valle (2023) we present an innovative methodology that allows for:

  • flexible modeling of high-dimensional dependency structures, also considering the impact of one or more covariates and accounting for individual as well as temporal heterogeneity in a natural way;

  • clustering without assuming the number of components a priori and density estimation;

  • easy interpretation of the results.

About the authors

References

Aas, Kjersti, Claudia Czado, Arnoldo Frigessi, and Henrik Bakken. 2009. “Pair-Copula Constructions of Multiple Dependence.” Insurance: Mathematics and Economics 44 (2): 182–98.
Abegaz, Fentaw, Irène Gijbels, and Noël Veraverbeke. 2012. “Semiparametric Estimation of Conditional Copulas.” Journal of Multivariate Analysis 110: 43–73.
Acar, Elif F, Radu V Craiu, and Fang Yao. 2011. “Dependence Calibration in Conditional Copulas: A Nonparametric Approach.” Biometrics 67 (2): 445–53.
Barone, Rosario, and Luciana Dalla Valle. 2023. “Bayesian Nonparametric Modeling of Conditional Multidimensional Dependence Structures.” Journal of Computational and Graphical Statistics, 1–10.
Bedford, Tim, and Roger M Cooke. 2001. “Probability Density Decomposition for Conditionally Dependent Random Variables Modeled by Vines.” Annals of Mathematics and Artificial Intelligence 32 (1-4): 245–68.
Czado, Claudia. 2019. Analyzing Dependent Data with Vine Copulas. Lecture Notes in Statistics, Springer. Springer.
Gijbels, Irene, Marek Omelka, Noël Veraverbeke, et al. 2012. “Multivariate and Functional Covariates and Conditional Copulas.” Electronic Journal of Statistics 6: 1273–1306.
MacEachern, Steven N, and Peter Müller. 1998. “Estimating Mixture of Dirichlet Process Models.” Journal of Computational and Graphical Statistics 7 (2): 223–38.
Sklar, A. 1959. “Fonctions dé Repartition à n Dimension Et Leurs Marges.” Université Paris 8 (3.2): 1–3.
To leave a comment for the author, please follow the link and comment on their blog: YoungStatS.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)