Weighted residual empirical processes in semi-parametric copula adjusted for regression
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Overview
In this post we first review the concept of semi-parametric copula and the accompanying estimation procedure of pseudo-likelihood estimation (PLE). We then generalize the estimation problem to the setting where the copula signal is hidden in a semi- or non-parametric regression model. Under this setting we have to base the PLE on the residuals. The particular challenge of the diverging score function is handled via the technique of the weighted residual empirical processes.
The semi-parametric copula model
Copula has been a popular method to model multivariate dependence structure since its introduction in Sklar (1959). Consider a random vector E=(E1,…,Ep)⊤∈Rp with joint distribution function H; we assume throughout that Ek, k∈{1,…,p} has absolutely continuous marginal distribution function Fk. Then the copula C associated with E is the joint distribution function of the marginally transformed random vector (F1(E1),…,Fp(Ep))⊤. It is clear from this definition that C itself is always a distribution supported on the unit hypercube [0,1]p, and C always has uniform marginals supported on [0,1] whatever the marginals of E may be. (The explicit form of C follows from the Sklar’s theorem, for instance Corollary 2.10.10 in Nelsen (2006): C(u)=H(F←1(u1),…,F←p(up)) for F←k the inverse of Fk and u=(u1,…,up)⊤∈[0,1]p.) Furthermore, by the invariance property, if g1,…,gp are univariate strictly increasing functions, then E and its marginally transformed version (g1(E1),…,gp(Ep))⊤ will admit the copula.
Thus, copula is a margin-free measure of multivariate dependence. Applied in the opposite direction, one could also start from a copula and couple the copula with arbitrary marginals to create multivariate distributions in a flexible manner. For instance, beyond the usual applications in finance and economy, copulas could be used in the latter manner to model the dependence among the repeated observations in longitudinal data (Sun, Frees, and Rosenberg (2008)).
In this post we will focus on the semi-parametric copula model that serves as a middle ground between a totally non-parametric approach to copula modelling (via the so called empirical copula, see for instance Fermanian, Radulović, and Wegkamp (2004) and Berghaus, Bücher, and Volgushev (2017)) and a totally parametric modelling of the random vector E. In the semi-parametric copula model, we consider a collection of possible distributions of E where the copulas C=C(⋅;θ) are constrained to be parametrized by an Euclidean copula parameter θ=(θ1,…,θd)⊤, but where the marginals F1,…,Fp of E could range over all p-tuples of absolutely continuous univariate distribution functions.
The pseudo-likelihood method
In the semi-parametric copula model, the primary interest is often the true value θ∗ of the copula parameter that determines the multivariate dependence. An obvious challenge in estimating θ∗ in the copula setting is how to handle the unknown marginals F1,…,Fp. The canonical solution is the pseudo-likelihood estimation (PLE) introduced in Oakes (1994) and Genest, Ghoudi, and Rivest (1995) that we now describe.
Let g1(⋅;θ),…,gd(⋅;θ) be a collection of
appropriate score functions such that the population estimating
equation Egm(F1(E1),…,Fp(Ep);θ)=0 holds only when
θ=θ∗, for all m∈{1,…,d}. In principle one can
always choose the score functions to be the ones in the maximum
likelihood estimation, namely
gm(⋅;θ)=∂∂θmlogc(⋅;θ)
where c(⋅;θ) is the density of the copula
C(⋅;θ). Thus, if F1,…,Fp were known, to estimate
θ∗ empirically based on a sample
Ei=(Ei,1,…,Ei,p)⊤, i∈{1,…,n} of E, one
could simply “find the zero” of the empirical version of the estimating
equation, that is to estimate θ∗ by
ˆθparametric that solves, for all
m∈{1,…,d}, 1nn∑i=1gm(F1(Ei,1),…,Fp(Ei,p);ˆθparametric)=0.
Residual-based pseudo-likelihood for semi-parametric copula adjusted for regression
From now on we suppose that the copula signal E=Eθ∗ is
“hidden” in a multivariate response semi- or non-parametric regression
model, a setting considered in our recent work
(Zhao, Gijbels, and Van Keilegom (2022)): for a covariate X∈Rq (independent
of E) and a response Y=(Y1,…,Yp)⊤∈Rp,
Y1=m1(X)+E1,⋮Yp=mp(X)+Ep.
Under this regression model, we can observe an i.i.d. sample
(Y1,X1),…,(Yn,Xn) of (Y,X), but crucially not
the copula sample E1,…,En. To eventually arrive at our
estimator for θ∗ in this setting, we will first form our
empirical copula ˆCn based on the residuals of the regression as
follows. Let ˆmk be some estimator for mk. Let’s estimate the
kth component of Ei=(Ei,1,…,Ei,p)⊤ by the residual
ˆEi,k=Yi,k–ˆmk(Xi).
Comparing above equations, we
would expect that when the residual-based empirical copula ˆCn is
asymptotically indistinguishable from the oracle empirical copula Cn,
the residual-based copula parameter estimator
ˆθresidual should be asymptotically
indistinguishable from ˆθoracle as well. To formally
reach this conclusion, standard estimating equation theory requires
(among other conditions that we will ignore in this post) that the
estimating equations at the truth should become indistinguishable,
namely ∫gm(u;θ∗)dˆCn(u)–∫gm(u;θ∗)dCn(u)=op(n−1/2).
One typical, although ultimately restrictive, approach to establish above equation is to invoke integration by parts
(Neumeyer, Omelka, and Hudecová (2019), Chen, Huang, and Yi (2021)): ideally, this would
yield ∫gm(u;θ∗)dˆCn(u)∼∫ˆCn(u)dgm(u;θ∗)up to op(n−1/2)≈Cn(u)dgm(u;θ∗)∼∫gm(u;θ∗)dCn(u).
In Zhao, Gijbels, and Van Keilegom (2022) we instead adopted a more direct approach.
Let
g[k]m(u;θ∗)=∂∂ukgm(u;θ∗).
Then Taylor-expanding above equation we see that we will need
(among other ingredients) an op(n−1/2) rate for the terms on the
right-hand side of ∫gm(u;θ∗)d(ˆCn−Cn)(u)≈p∑k=1[1nn∑i=1g[k]m(Fk(Ei,k);θ∗){ˆFn,k(ˆEi,k)−Fn,k(Ei,k)}].
Weighted empirical processes on the real line
In this section we consider estimating a distribution function F=FU
of a random variable U. We rely on the empirical distribution function
Fn constructed from the i.i.d. observations U1,…,Un of U:
Fn(t)=1n+1∑ni=11{Ui≤t}. The
resulting classical empirical process on the real line
√n(Fn−F)(t), t∈R must be one of the most extensively
studied objects in all of probability; for illustration we will just
quote a form of the associated law of the iterated logarithm (LIL):
lim supn→∞supt|1√loglog(n)√n(Fn−F)(t)|=1√2.
Above: Plot of the deviation Fn−F, for sample size n=50, based on 100 Monte-Carlo simulations. For simplicity we assumed U∼Unif(0,1), so F(t)=t for t∈[0,1]. The deviation from each simulation is represented by a single red dashed line. The 10% and 90% quantiles of the deviations at each t value are indicated by the two blue lines. Clearly, the “band” enclosing the deviations gets narrower toward the boundaries of the support of U.]
This feature can also be characterized theoretically. For instance, we
can find the LIL for the weighted process √n(Fn−F)/√F,
that is the classical process but now scaled by an additional standard
deviation factor 1/√F, from Csáki (1977): for some \(0
Such results on the weighted empirical processes can be generalized to settings beyond the real line, for instance to sets in Rp (Alexander (1987)) and sets of functions (Giné, Koltchinskii, and Wellner (2003)).
Results for residual-based estimators
For us, the idea of the weighted empirical processes will be applied to the residual empirical processes, which will further culminate in our eventual result on the residual-based estimator ˆθresidual for the copula parameter. We will first consider the weighted residual empirical processes.
Results on weighted residual empirical process
Let fk be the density of Ek, and Tn be the σ-field
generated by the (Xi,Yi)’s, i∈{1,…,n}. The usual
decomposition of a residual empirical process is ˆFn,k(t)–Fn,k(t)=fk(t)⋅E[(ˆmk−mk)(X)|Tn]+r1,k(t)
Theorem: Suppose that we can embed ˆmk−mk into a function class
D with bracketing number
N[](τ,D)≲(1/τ)βexp(K(1/τ)1/ν)
where β, K and ν are constants. Suppose that
‖ˆmk−mk‖∞=Op(an) (where ‖⋅‖∞ is the
supremum norm over the support of X). Then under mild regularity
conditions, supt∈ R|r1,k(t)|n−12{fk(t)⋅an+a2n}12(1−1/ν)+n−11+1/ν+a2n=Op(1).
Clearly, the convergence rate of the remainder term r1,k is improved by both the rate of ‖ˆmk−mk‖∞ and the weight fk. The latter point is especially beneficial when fk is a density “of the usual shape” that decays at its tails, which will allow the rate of r1,k to be tightened accordingly (exactly similar to how the rate of √n(Fn−F)(t) is tightened by √F(t) in the weighted empirical processes on the real line that we reviewed earlier). These features will tame the divergence of g[k]m in above equation.
Because what eventually form the ingredients of the residual-based copula ˆCn are the residual ranks ˆFn,k(ˆEi,k)’s, we need to go one step further and consider the analogous results for them:
Theorem: For all n≥1, k∈{1,…,p} and i∈{1,…,n},
ˆFn,k(ˆEi,k)–Fn,k(Ei,k)=−fk(Ei,k){(ˆmk−mk)(Xi)–E[(ˆmk−mk)(X)|Tn]}+r1,k(ˆEi,k)+r2,k,i
Note that simliar to r1,k earlier, the rate of r2,k,i also enjoys the dependence on ‖ˆmk−mk‖∞ and the weighing by fk.
Results on residual-based estimator for the copula parameter
We are now ready to plug in the decomposition of ˆFn,k(ˆEi,k)–Fn,k(Ei,k) in the equation in the last theorem into above equation. The leading term (the one proportional to fk), which is centered, is now summed over i∈{1,…,n} in and so enjoys an additional n−1/2-scaling. The remainder terms are weighted and so tame the divergence of the scores g[k]m. Eventually, we arrive at the asymptotic equivalence between the residual-based PLE and the oracle PLE:
Theorem: Under the conditions of the last two theorems, and some
additional regularity conditions which in particular require, for
m∈{1,…,d} and k∈{1,…,p}, ∫{g[k]m(u;θ∗)fk(F←k(uk))}2dC(u)<∞,
Condition above in fact allows for quite non-trivial divergence of the score functions gm (it certainly accommodates the Gaussian and the t-copulas). To apply the theorem above, one still needs to verify the correct upper bound on the bracketing number for embedding hatmk−mk, which again turns out to be non-restrictive. For instance, for partly linear regression Yk=˜mk(x)+θ⊤kw+Ek with w∈RqL,n, we can allow the dimension qL,n of the linear covariate to grow up to qL,n=o(n1/4).
Bibliography
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.