Weighted residual empirical processes in semi-parametric copula adjusted for regression

[This article was first published on YoungStatS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Overview

In this post we first review the concept of semi-parametric copula and the accompanying estimation procedure of pseudo-likelihood estimation (PLE). We then generalize the estimation problem to the setting where the copula signal is hidden in a semi- or non-parametric regression model. Under this setting we have to base the PLE on the residuals. The particular challenge of the diverging score function is handled via the technique of the weighted residual empirical processes.

The semi-parametric copula model

Copula has been a popular method to model multivariate dependence structure since its introduction in Sklar (1959). Consider a random vector E=(E1,,Ep)Rp with joint distribution function H; we assume throughout that Ek, k{1,,p} has absolutely continuous marginal distribution function Fk. Then the copula C associated with E is the joint distribution function of the marginally transformed random vector (F1(E1),,Fp(Ep)). It is clear from this definition that C itself is always a distribution supported on the unit hypercube [0,1]p, and C always has uniform marginals supported on [0,1] whatever the marginals of E may be. (The explicit form of C follows from the Sklar’s theorem, for instance Corollary 2.10.10 in Nelsen (2006): C(u)=H(F1(u1),,Fp(up)) for Fk the inverse of Fk and u=(u1,,up)[0,1]p.) Furthermore, by the invariance property, if g1,,gp are univariate strictly increasing functions, then E and its marginally transformed version (g1(E1),,gp(Ep)) will admit the copula.

Thus, copula is a margin-free measure of multivariate dependence. Applied in the opposite direction, one could also start from a copula and couple the copula with arbitrary marginals to create multivariate distributions in a flexible manner. For instance, beyond the usual applications in finance and economy, copulas could be used in the latter manner to model the dependence among the repeated observations in longitudinal data (Sun, Frees, and Rosenberg (2008)).

In this post we will focus on the semi-parametric copula model that serves as a middle ground between a totally non-parametric approach to copula modelling (via the so called empirical copula, see for instance Fermanian, Radulović, and Wegkamp (2004) and Berghaus, Bücher, and Volgushev (2017)) and a totally parametric modelling of the random vector E. In the semi-parametric copula model, we consider a collection of possible distributions of E where the copulas C=C(;θ) are constrained to be parametrized by an Euclidean copula parameter θ=(θ1,,θd), but where the marginals F1,,Fp of E could range over all p-tuples of absolutely continuous univariate distribution functions.

The pseudo-likelihood method

In the semi-parametric copula model, the primary interest is often the true value θ of the copula parameter that determines the multivariate dependence. An obvious challenge in estimating θ in the copula setting is how to handle the unknown marginals F1,,Fp. The canonical solution is the pseudo-likelihood estimation (PLE) introduced in Oakes (1994) and Genest, Ghoudi‬, and Rivest (1995) that we now describe.

Let g1(;θ),,gd(;θ) be a collection of appropriate score functions such that the population estimating equation Egm(F1(E1),,Fp(Ep);θ)=0 holds only when θ=θ, for all m{1,,d}. In principle one can always choose the score functions to be the ones in the maximum likelihood estimation, namely gm(;θ)=θmlogc(;θ) where c(;θ) is the density of the copula C(;θ). Thus, if F1,,Fp were known, to estimate θ empirically based on a sample Ei=(Ei,1,,Ei,p), i{1,,n} of E, one could simply “find the zero” of the empirical version of the estimating equation, that is to estimate θ by ˆθparametric that solves, for all m{1,,d}, 1nni=1gm(F1(Ei,1),,Fp(Ei,p);ˆθparametric)=0.

(The superscript “parametric” points to the fact that when F1,,Fp are known, we are basically solving a parametric problem.) However, in semi-parametric copula modelling we commonly avoid setting F1,,Fp to some particular form. The PLE method solves this problem by replacing the unknown Fk, k{1,,p} by its empirical counterpart, namely the empirical distribution function Fn,k(t)=1n+1ni=11{Ei,kt}. The oracle PLE estimator ˆθoracle of θ is then the one that solves the following, revised estimating equation: for all m{1,,d}, $1nni=1gm(Fn,1(Ei,1),,Fn,p(Ei,p);ˆθoracle)=gm(u;ˆθoracle)dCn(u)=0.
All integrals in this post are over [0,1]p. To simplify our expression above and later, we have introduced the empirical copula Cn that is a multivariate distribution function on [0,1]p with a mass of 1/n at each of (Fn,1(Ei,1),,Fn,p(Ei,p)), i{1,,n} (precisely, Cn(u)=1nni=11{Fn,1(Ei,1)u1,,Fn,p(Ei,p)up}). The qualifier “oracle” in ˆθoracle is used to distinguish the current case when we can still directly observe the copula sample Ei, i{1,,n} (albeit without knowing F1,,Fp), from the case when even that sample will be subject to perturbation which we now turn to.

Residual-based pseudo-likelihood for semi-parametric copula adjusted for regression

From now on we suppose that the copula signal E=Eθ is “hidden” in a multivariate response semi- or non-parametric regression model, a setting considered in our recent work (Zhao, Gijbels, and Van Keilegom (2022)): for a covariate XRq (independent of E) and a response Y=(Y1,,Yp)Rp, Y1=m1(X)+E1,Yp=mp(X)+Ep.

In its raw form, the model above is a purely non-parametric regression model; by specifying particular forms of the regression function mk, the above will accommodate a wide range of popular non- and semi-parametric regression variants such as the partly linear regression model and the additive model. (It’s not much more difficult to consider a more flexible, heteroscedastic model Yk=mk(X)+σk(X)Ek, though we refrain from doing so in this post.) Gijbels, Omelka, and Veraverbeke (2015) considered a similar model and studied the resulting empirical copula process.

Under this regression model, we can observe an i.i.d. sample (Y1,X1),,(Yn,Xn) of (Y,X), but crucially not the copula sample E1,,En. To eventually arrive at our estimator for θ in this setting, we will first form our empirical copula ˆCn based on the residuals of the regression as follows. Let ˆmk be some estimator for mk. Let’s estimate the kth component of Ei=(Ei,1,,Ei,p) by the residual ˆEi,k=Yi,kˆmk(Xi).

Then, we form the residual-based empirical distribution and copula: ˆFn,k(t)=1n+1ni=11{ˆEi,kt},ˆCn(u)=1nni=1pk=11{ˆFn,k(ˆEi,k)uk}.
Finally, to estimate θ, we settle for the estimator ˆθresidual that solves 1nni=1gm(ˆFn,1(ˆEi,1),,ˆFn,p(ˆEi,p);ˆθresidual)=gm(u;ˆθresidual)dˆCn(u)=0.

Comparing above equations, we would expect that when the residual-based empirical copula ˆCn is asymptotically indistinguishable from the oracle empirical copula Cn, the residual-based copula parameter estimator ˆθresidual should be asymptotically indistinguishable from ˆθoracle as well. To formally reach this conclusion, standard estimating equation theory requires (among other conditions that we will ignore in this post) that the estimating equations at the truth should become indistinguishable, namely gm(u;θ)dˆCn(u)gm(u;θ)dCn(u)=op(n1/2).

One typical, although ultimately restrictive, approach to establish above equation is to invoke integration by parts (Neumeyer, Omelka, and Hudecová (2019), Chen, Huang, and Yi (2021)): ideally, this would yield gm(u;θ)dˆCn(u)ˆCn(u)dgm(u;θ)up to op(n1/2)Cn(u)dgm(u;θ)gm(u;θ)dCn(u).

In the above “” is only meant to give a drastically simplified and hence not-quite-correct representation of integration by parts (we refer the readers to Appendix A in Radulović, Wegkamp, and Zhao (2017) for a precise multivariate integration by parts formula particularly useful for copulas), but it already conveys the underlying idea: the aim is to convert ˆCn and Cn in the integrals from measures to integrands so that proving the closeness between the two integrals is clearly reduced to proving the closeness between ˆCn and Cn. However, the integration by parts trick, although popular, often requires bounded gm to properly define the measure dgm, but this is often not satisfied for even the most common copulas. For instance, in Gaussian copula, the score functions are quadratic forms of the Φ(uk)’s where Φ is the standard normal quantile function (see Eq. (2.2) in Segers, Akker, and Werker (2014)), so are clearly divergent as uk approaches 0 or 1.

In Zhao, Gijbels, and Van Keilegom (2022) we instead adopted a more direct approach. Let g[k]m(u;θ)=ukgm(u;θ). Then Taylor-expanding above equation we see that we will need (among other ingredients) an op(n1/2) rate for the terms on the right-hand side of gm(u;θ)d(ˆCnCn)(u)pk=1[1nni=1g[k]m(Fk(Ei,k);θ){ˆFn,k(ˆEi,k)Fn,k(Ei,k)}].

It is not enough that the terms ˆFn,k(ˆEi,k)Fn,k(Ei,k) on the right-hand side are op(n1/2) (in fact they are not), due to the divergence of g[k]m. We need to take a more careful look at ˆFn,k(ˆEi,k)Fn,k(Ei,k), whose analysis belongs to residual empirical processes. To demonstrate the benefits of considering the weighted version of such processes, we first review some basic literature on the weighted (non-residual) empirical processes in the simplified setting of the real line.

Weighted empirical processes on the real line

In this section we consider estimating a distribution function F=FU of a random variable U. We rely on the empirical distribution function Fn constructed from the i.i.d. observations U1,,Un of U: Fn(t)=1n+1ni=11{Uit}. The resulting classical empirical process on the real line n(FnF)(t), tR must be one of the most extensively studied objects in all of probability; for illustration we will just quote a form of the associated law of the iterated logarithm (LIL): lim supnsupt|1loglog(n)n(FnF)(t)|=12.

Clearly, the LIL treats all points tR equally. However, in reality the F(t) at some t is easier to estimate than others. This is essentially because the variability var{Fn(t)}=F(t){1F(t)}/n approaches 0 when F(t) approaches 0 or 1, for any n. We can clearly observe this feature in the small simulation study represented by the following figure, where the “band” enclosing the deviations (based on 100 Monte-Carlo simulations) gets narrower toward the boundaries of the support of U.

Above: Plot of the deviation FnF, for sample size n=50, based on 100 Monte-Carlo simulations. For simplicity we assumed UUnif(0,1), so F(t)=t for t[0,1]. The deviation from each simulation is represented by a single red dashed line. The 10% and 90% quantiles of the deviations at each t value are indicated by the two blue lines. Clearly, the “band” enclosing the deviations gets narrower toward the boundaries of the support of U.]

This feature can also be characterized theoretically. For instance, we can find the LIL for the weighted process n(FnF)/F, that is the classical process but now scaled by an additional standard deviation factor 1/F, from Csáki (1977): for some \(0, lim supnsupF(t)(1n,12]|logloglog(n)loglog(n)n(FnF)(t)F(t)|=C.

Compared to the LIL for the unweighted process earlier, we can see that the weighted process is just slightly more difficult to bound, but now n(FnF)(t) clearly enjoys a tighter bound toward the boundaries of the support of U due to the vanishing F(t) there.

Such results on the weighted empirical processes can be generalized to settings beyond the real line, for instance to sets in Rp (Alexander (1987)) and sets of functions (Giné, Koltchinskii, and Wellner (2003)).

Results for residual-based estimators

For us, the idea of the weighted empirical processes will be applied to the residual empirical processes, which will further culminate in our eventual result on the residual-based estimator ˆθresidual for the copula parameter. We will first consider the weighted residual empirical processes.

Results on weighted residual empirical process

Let fk be the density of Ek, and Tn be the σ-field generated by the (Xi,Yi)’s, i{1,,n}. The usual decomposition of a residual empirical process is ˆFn,k(t)Fn,k(t)=fk(t)E[(ˆmkmk)(X)|Tn]+r1,k(t)

where r1,k is a remainder term that could be controlled as follows:

Clearly, the convergence rate of the remainder term r1,k is improved by both the rate of ˆmkmk and the weight fk. The latter point is especially beneficial when fk is a density “of the usual shape” that decays at its tails, which will allow the rate of r1,k to be tightened accordingly (exactly similar to how the rate of n(FnF)(t) is tightened by F(t) in the weighted empirical processes on the real line that we reviewed earlier). These features will tame the divergence of g[k]m in above equation.

Because what eventually form the ingredients of the residual-based copula ˆCn are the residual ranks ˆFn,k(ˆEi,k)’s, we need to go one step further and consider the analogous results for them:

Note that simliar to r1,k earlier, the rate of r2,k,i also enjoys the dependence on ˆmkmk and the weighing by fk.

Results on residual-based estimator for the copula parameter

We are now ready to plug in the decomposition of ˆFn,k(ˆEi,k)Fn,k(Ei,k) in the equation in the last theorem into above equation. The leading term (the one proportional to fk), which is centered, is now summed over i{1,,n} in and so enjoys an additional n1/2-scaling. The remainder terms are weighted and so tame the divergence of the scores g[k]m. Eventually, we arrive at the asymptotic equivalence between the residual-based PLE and the oracle PLE:

Condition above in fact allows for quite non-trivial divergence of the score functions gm (it certainly accommodates the Gaussian and the t-copulas). To apply the theorem above, one still needs to verify the correct upper bound on the bracketing number for embedding hatmkmk, which again turns out to be non-restrictive. For instance, for partly linear regression Yk=˜mk(x)+θkw+Ek with wRqL,n, we can allow the dimension qL,n of the linear covariate to grow up to qL,n=o(n1/4).

Bibliography

Alexander, Kenneth S. 1987. “Rates of Growth and Sample Moduli for Weighted Empirical Processes Indexed by Sets.” Probability Theory and Related Fields 75 (3): 379–423.
Berghaus, Betina, Axel Bücher, and Stanislav Volgushev. 2017. “Weak Convergence of the Empirical Copula Process with Respect to Weighted Metrics.” Bernoulli 23 (1): 743–72.
Chen, Xiaohong, Zhuo Huang, and Yanping Yi. 2021. “Efficient Estimation of Multivariate Semi-Nonparametric GARCH Filtered Copula Models.” Journal of Econometrics 222 (1): 484–501.
Csáki, E. 1977. “The Law of the Iterated Logarithm for Normalized Empirical Distribution Function.” Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete 238 (2): 147–67.
Fermanian, Jean-David, Dragan Radulović, and Marten Wegkamp. 2004. “Weak Convergence of Empirical Copula Processes.” Bernoulli 10 (5): 847–60.
Genest, Christian, ‪Kilani Ghoudi‬, and ‪Louis-Paul Rivest. 1995. “A Semiparametric Estimation Procedure of Dependence Parameters in Multivariate Families of Distributions.” Biometrika 82: 543–52.
Gijbels, Irène, Marek Omelka, and Noël Veraverbeke. 2015. “Estimation of a Copula When a Covariate Affects Only Marginal Distributions.” Scandinavian Journal of Statistics 42 (4): 1109–26.
Giné, Evarist, Vladimir Koltchinskii, and Jon. A. Wellner. 2003. “Ratio Limit Theorems for Empirical Processes.” In Stochastic Inequalities and Applications, edited by Evarist Giné, Christian Houdré, and David Nualart, 249–78. Birkhäuser.
Nelsen, Roger B. 2006. An Introduction to Copulas. 2nd ed. New York: Springer.
Neumeyer, Natalie, Marek Omelka, and Šárka Hudecová. 2019. “A Copula Approach for Dependence Modeling in Multivariate Nonparametric Time Series.” Journal of Multivariate Analysis 171: 139–62.
Oakes, David. 1994. “Multivariate Survival Distributions.” Journal of Nonparametric Statistics 3 (3–4): 343–54.
Radulović, Dragan, Marten Wegkamp, and Yue Zhao. 2017. “Weak Convergence of Empirical Copula Processes Indexed by Functions.” Bernoulli 23 (4): 3346–84.
Segers, Johan, Ramon van den Akker, and Bas J. M. Werker. 2014. “Semiparametric Gaussian Copula Models: Geometry and Efficient Rank-Based Estimation.” The Annals of Statistics 42 (5): 1911–40.
Sklar, Abe. 1959. “Fonctions de Répartition à n Dimensions Et Leurs Marges.” Publications de l’Institut de Statistique de L’Université de Paris 8: 229–31.
Sun, Jiafeng, Edward W. Frees, and Marjorie A. Rosenberg. 2008. “Heavy-Tailed Longitudinal Data Modeling Using Copulas.” Insurance: Mathematics and Economics 42 (2): 817–30.
Zhao, Yue, Irène Gijbels, and Ingrid Van Keilegom. 2022. Parametric copula adjusted for non- and semiparametric regression.” The Annals of Statistics 50 (2): 754–80.
To leave a comment for the author, please follow the link and comment on their blog: YoungStatS.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)