Non-Homogeneous Poisson Process Intensity Modeling and Estimation using Measure Transport
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
A NHPP defined on S⊂Rd can be fully characterized through its intensity function λ:S→[0,∞). We present a general model for the intensity function of a non-homogeneous Poisson process using measure transport. The model finds its roots in transportation of probability measure (Marzouk et al. 2016), an approach that has gained popularity recently for its ability to model arbitrary probability density functions. The basic idea of this approach is to construct a “transport map” between the complex, unknown, intensity function of interest, and a simpler, known, reference intensity function.
Background
Measure Transport. Consider two probability measures μ0(⋅)
and μ1(⋅) defined on X and Z, respectively. A
transport map T:X→Z is said to push forward
μ0(⋅) to μ1(⋅) (written compactly as
T#μ0(⋅)=μ1(⋅)) if and only if μ1(B)=μ0(T−1(B)),for any
Borel subset B⊂Z.
Suppose X,Z⊆Rd, and that both
μ0(⋅) and μ1(⋅) are absolutely continuous with respect
to the Lebesgue measure on Rd, with densities
dμ0(x)=f0(x)dx and dμ1(z)=f1(z)dz, respectively.
Furthermore, assume that the map T(⋅) is bijective differentiable
with a differentiable inverse T−1(⋅) (i.e., assume that
T(⋅) is a C1 diffeomorphism), then we have f0(x)=f1(T(x))‖det(∇T(x))‖,x∈X.
Triangular Map. One particular type of transport map is an
, that is, T(x)=(T(1)(x(1)),T(2)(x(1),x(2)),…,T(d)(x(1),…,x(d)))′,x∈X,
Various approaches to parameterize an increasing triangular map have been proposed (see, for example, Germain et al. (2015),
Dinh, Krueger, and Bengio (2015) and Dinh, Sohl-Dickstein, and Bengio (2017)). One class of parameterizations is based on
the so-called “conditional networks” (Papamakarios, Pavlakou, and Murray (2017) and
Huang et al. (2018)): T(1)1(x(1))=S(1)1(x(1);θ11),T(k)1(x(1),…,x(k))=S(k)1(x(k);θ1k(x(1),…,x(k−1);ϑ1k)),k=2,…,d,
Composition of Increasing Triangular Maps. Composition of Increasing
Triangular Mapsn does not break the required bijectivity of T(⋅),
since a bijective function of a bijective function is itself bijective.
Computations also remain tractable, since the determinant of the
gradient of the composition is simply the product of the determinants of
the individual gradients. Specifically, consider two increasing
triangular maps T1(⋅) and T2(⋅), each constructed using
the neural network approach described above. The composition
T2∘T1(⋅) is bijective, and its gradient at some
x∈X has determinant,
det(∇T2∘T1(x))=(det(∇T1(x)))(det(∇T2(T1(x)))).
Intensity Modeling and Estimation via Measure Transport
Consider a NHPP P defined on a bounded
domain X⊂Rd. The density of a Poisson
process with respect to the density of the unit rate Poisson process is
given by fλ(X)=exp(‖X‖–μλ(X))∏x∈Xλ(x)=exp(∫X(λ(x)–1)dx+∑x∈Xlogλ(x)).
In order to apply the measure transport approach to intensity function
estimation, we first define ˜ρ(⋅)=ρ(⋅)/μρ(X) and ˜λ(x)=λ(x)/μλ(X), so that
˜ρ(⋅) and ˜λ(⋅) are valid density
functions with respect to Lebesgue measure. The KL divergence
DKL(fλ||fρ) can be written in terms of process
densities as follows,
DKL(fλ‖fρ)=μρ(X)–μλ(X)∫X˜λ(x)log˜ρ(x)dx–μλ(X)logμρ(X)+const.,
We model the process density ˜ρ(⋅) using the transportation of
probability measure approach. Specifically, we seek a diffeomorphism
T:X→Z, where Z need not be the
same as X, such that
˜ρ(x)=η(T(x))|det∇T(x)|,x∈X,
We prove that the increasing triangular maps constructed using neural autoregressive flows satisfy a universal property in the context of probability density approximation.
Theorem 1. Let P be a non-homogeneous Poisson process with positive continuous process density
˜λ(⋅) on X⊂Rd. Suppose
further that the weak (Sobolev) partial derivatives of ˜λ
up to order d+1 are integrable over Rd. There exists a
sequence of triangular mappings (Ti(⋅))i wherein the kth
component of each map T(k)i(⋅) has the form above, and wherein the corresponding conditional networks
are universal approximators (e.g., feedforward neural networks with
sigmoid activation functions), such that
η(Ti(⋅))det(∇Ti(⋅))→˜λ(⋅),
Illustration
We apply our method for intensity function estimation to an earthquake data set comprising 1000 seismic events of body-wave magnitude (MB) over 4.0. The data set is available from the package. The events we analyze are those that occurred near Fiji from 1964 onwards. The left panel of Figure 1 shows a scatter plot of locations of the observed seismic events.
We fitted our model using a composition of five triangular maps. The estimated intensity surface and the standard error surface obtained using are shown in the middle and right panels of Figure 1, respectively. The probability that the intensity function exceeds various threshold can also be estimated using non-parametric bootstrap resampling; some examples of these exceedance probability plots are shown in Figure 2.
Figure 1. Left panel: Scatter plot of earthquake events with body-wave magnitude greater than 4.0 near Fiji since 1964. Middle panel: Estimated intensity function obtained using measure transport. Right panel: Estimated standard error of the intensity surface.
Figure 2. Left panel: Estimated exceedance probability P(λ(⋅)>1). Middle panel: Estimated exceedance probability P(λ(⋅)>5). Right panel: Estimated exceedance probability P(λ(⋅)>10).
References
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.