Characterization-based approach for construction of goodness-of-fit test for Lévy distribution
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
The Lévy distribution, together with the Normal and Cauchy distribution,
belongs to the class of stable distributions, and it is among the only
three distributions for which the density can be derived in a closed
form. The density function of the two-parameter Lévy distribution is
expressed as follows: f(x;λ,μ)=√λ2πe−λ2(x−μ)(x−μ)32,x≥μ,λ>0,μ∈R.
Characterization 1 Suppose that X,Y and Z are independent and
identically distributed random variables with density f defined on
(0,∞). Then
Z and aX+bY(√a+√b)2, 0<a,b<∞
The first application of this characterization in the development of a goodness-of-fit test for the Lévy distribution was presented in [4] for the specific case of a=b=1. They proposed a test statistic given by:
\(\begin{equation*} T_n^*=\int_{{\mathbb{R}^+}}\Big(\frac{1}{\binom{n}{2}}\sum\limits_{j
In [1], we extended the aforementioned statistic to cover the case of arbitrary values of a,b∈N. Additionally, we investigated the asymptotic distributions of these generalized test statistics.
Our test statistics
The equivalence in distribution between two random variables can also be established by equating their Laplace transforms. Considering this, our tests are constructed either as the supremum of the difference or the integrated difference of the corresponding V-empirical Laplace transforms of the terms described in the Characterization 1. The underlying rationale for this approach is that the test statistic will have small values when the sample is drawn from the Lévy distribution. The proposed test statistics are of the form:
Jn,a=supt>0|(1n2∑i,je−t(Yi+Yj)4−1n∑ie−tYi)e−att32|=supt∈[0,1]|(1n2∑i,jtYi+Yj4−1n∑itYi)ta(−logt)32|,Rn,a=∫R+(1n∑ie−tYi−1n2∑i,je−t(Yi+Yj)4)e−att32dt=3√π4n2∑i,j(1(a+Yi+Yj4)52−12(a+Yi)52−12(a+Yj)52),
We have determined the asymptotic distributions of the novel tests and provided the 95th percentiles of empirical distributions for large sample sizes, demonstrating a fast stabilization of the distribution. These results are summarized in the following two theorems:
Theorem 1 Let a≥1 and X1,X2,…,Xn be i.i.d random variables distributed according to the Lévy law with scale parameter λ. Then the following holds: √nJn,aD→supt∈[0,1]∣ξ(t)∣, where ξ(t) is a centred Gaussian random process, having the following covariance function:
K(s,t)=sata(−log(s))3/2(−log(t))3/2(−e−√2(√−log(s)+√(−log(t)))−2e−√−2(log(s)−14log(t))+√−log(t)2−2e−√2(−log(t)−14log(s))+√−log(s)2+4e−√−log(st)+√−log(s)+√−log(t)√2+e√−2log(st)).
Theorem 2 Let a≥1 and X1,X2,…,Xn be i.i.d random variables distributed according to the Lévy law with scale parameter λ. Then, for every a>0, the asymptotic distribution of √nRn,a as n→∞ is normal N(0,σ2R(a)) where σ2R(a)=4Eζ(X;a)2.
The expression for ζ is intricate, and for the exact formulation, we refer the reader to [1] for the exact expression.
Performance of novel tests
For assessing the performance of test statistics, one can usually consider their powers against a wide range of alternatives.
We conducted a power analysis of the tests at a significance level of α=0.05 using a Monte Carlo method with 10,000 replications (N = 10,000). The objective of our study was to compare the JEL and AJEL approaches proposed [4] with the classical approach, as well as to determine the empirical power of the new tests. The test powers were obtained using the Monte Carlo approach. Furthermore, the supremum of the calculation for Jn,a was acquired using a grid search on 1,000 equidistant points within the interval [0, 1].
Our findings revealed that the JEL and AJEL approaches proposed in [4] are less powerful than the classical approach when the testing is conducted using the original version of |I[1,1]|. In almost all cases, both Ra and Ja outperform the JEL and AJEL methods. We conclude that the novel tests demonstrate superior performance compared to the tests Na1 and Nb1 proposed in [5]. When compared to EDF-based tests, the performance of the novel tests is better in some cases and comparable in others, both for median-based and maximum likelihood estimators.
In the case of large samples, the most natural way to compare tests is through the notion of asymptotic efficiency.
For a detailed review of the theory presented below, we refer the reader to the comprehensive work of Nikitin [6].
Let G={g(x;θ),θ>0} be a family of alternatives density functions, such that g(x;0) has the Lévy distribution with arbitrary scale parameter, and ∫R+1x2g(x;θ)dx<∞ for θ in the neighbourhood of 0, and some additional regularity conditions for U-statistics with non-degenerate kernels hold [7, 8]. Let also {Tn} and {Vn} be two sequences of test statistic that we want to compare.
Then for any alternative distribution from G the relative
Bahadur efficiency of the {Tn} with respect to {Vn} can be
expressed as e(T,V)(θ)=cT(θ)cV(θ),
It is well known that for the Bahadur slope function
Bahadur–Ragavacharri inequality holds [9], that is
cT(θ)≤2K(θ),
If the sequence {Tn} of test statistics under the alternative
converges in probability to some finite function b(θ)>0 and the
limit limn←∞n−1logPH0(Tn≥t)=−fLD(t)
In situations where the function cT cannot be computed as θ approaches zero, an alternative approach is to approximate the Bahadur slope as c∗T(θ). This approximate slope often closely coincides with the exact one. To calculate the approximate slope, we do not require the tail behavior of the distribution function of the statistics Tn but instead need the tail behavior of its limiting distribution, which is typically easier to determine.
Specifically, if the limiting distribution function of Tn under the null hypothesis H0 is denoted as FT, and its tail behavior is given by log(1−FT(t))=−a∗Tt22(1+o(1)), where a∗T is a positive real number, and the limit in probability of Tn√n is denoted as b∗T(θ)>0, then the approximate Bahadur slope is equal to c∗T(θ)=a∗T⋅(b∗T(θ))2. For the calculation of the local approximate Bahadur slope, one can utilize Maclaurin expansion.
Our research findings, focusing solely on the case of the maximum likelihood estimator, revealed that the tuning parameter a significantly impacts the efficiency of Ra and Ja. In all examined scenarios, the Bahadur efficiency of Ja decreases as a increases. However, this is not the case for the statistic Ra. Based on our analysis, we concluded that the new statistics outperform the Bhati-Kattumanil one, with Ra exhibiting superior performance in terms of local approximate Bahadur efficiencies.
We applied the novel tests on two real datasets. The first one contained the weighted rainfall data for the month of January in India. The second dataset consisted of the well yields near Bel Air, Hartford county, Maryland.
Figure: Histogram of rainfall application data and the appropriate Lévy density. The theoretical Lévy densities are drawn using the maximum likelihood estimate of the scale parameter λ.
Based on
Žikica Lukić & Bojana Milošević (2023) Characterization-based approach for construction of goodness-of-fit test for Lévy distribution, Statistics, 57:5, 1087-1116, DOI: 10.1080/02331888.2023.2238236
References
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.