Non-Homogeneous Poisson Process Intensity Modeling and Estimation using Measure Transport

[This article was first published on YoungStatS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

A NHPP defined on SRd can be fully characterized through its intensity function λ:S[0,). We present a general model for the intensity function of a non-homogeneous Poisson process using measure transport. The model finds its roots in transportation of probability measure (Marzouk et al. 2016), an approach that has gained popularity recently for its ability to model arbitrary probability density functions. The basic idea of this approach is to construct a “transport map” between the complex, unknown, intensity function of interest, and a simpler, known, reference intensity function.

Background

Measure Transport. Consider two probability measures μ0() and μ1() defined on X and Z, respectively. A transport map T:XZ is said to push forward μ0() to μ1() (written compactly as T#μ0()=μ1()) if and only if μ1(B)=μ0(T1(B)),for any Borel subset BZ.

Suppose X,ZRd, and that both μ0() and μ1() are absolutely continuous with respect to the Lebesgue measure on Rd, with densities dμ0(x)=f0(x)dx and dμ1(z)=f1(z)dz, respectively. Furthermore, assume that the map T() is bijective differentiable with a differentiable inverse T1() (i.e., assume that T() is a C1 diffeomorphism), then we have f0(x)=f1(T(x))det(T(x)),xX.

Triangular Map. One particular type of transport map is an , that is, T(x)=(T(1)(x(1)),T(2)(x(1),x(2)),,T(d)(x(1),,x(d))),xX,

where, for k=1,,d, one has that T(k)(x(1),,x(k)) is monotonically increasing in x(k). In particular, the Jacobian matrix of an increasing triangular map, if it exists, is triangular with positive entries on its diagonal.

Various approaches to parameterize an increasing triangular map have been proposed (see, for example, Germain et al. (2015), Dinh, Krueger, and Bengio (2015) and Dinh, Sohl-Dickstein, and Bengio (2017)). One class of parameterizations is based on the so-called “conditional networks” (Papamakarios, Pavlakou, and Murray (2017) and Huang et al. (2018)): T(1)1(x(1))=S(1)1(x(1);θ11),T(k)1(x(1),,x(k))=S(k)1(x(k);θ1k(x(1),,x(k1);ϑ1k)),k=2,,d,

for xX, where θ1k(x(1),,x(k1);ϑ1k),k=2,,d is the kth “conditional network” that takes x(1),,x(k1) as inputs and is parameterized by ϑ1k, and S(k)1() is generally a very simple univariate function of x(k), but with parameters that depend in a relatively complex manner on x(1),,x(k1) through the network. The class of neural autoregressive flows, proposed by Huang et al. (2018), offers more flexibility where the k-th component of the map has the form S(k)1(x(k);θ1k)=σ1(Mi=1w1kiσ(a1kix(k)+b1ki)),
where σ1() is the logit function, and w1ki is subject to the constraint Mi=1w1ki=1.

Composition of Increasing Triangular Maps. Composition of Increasing Triangular Mapsn does not break the required bijectivity of T(), since a bijective function of a bijective function is itself bijective. Computations also remain tractable, since the determinant of the gradient of the composition is simply the product of the determinants of the individual gradients. Specifically, consider two increasing triangular maps T1() and T2(), each constructed using the neural network approach described above. The composition T2T1() is bijective, and its gradient at some xX has determinant, det(T2T1(x))=(det(T1(x)))(det(T2(T1(x)))).

Intensity Modeling and Estimation via Measure Transport

Consider a NHPP P defined on a bounded domain XRd. The density of a Poisson process with respect to the density of the unit rate Poisson process is given by fλ(X)=exp(Xμλ(X))xXλ(x)=exp(X(λ(x)1)dx+xXlogλ(x)).

where |B| denote the Lebesgue measure of a bounded set BRd, and let X{x1,,xn}, where xiX,i=1,,n, and n1, be a realization of P. The unknown intensity function is estimated by a maximum likelihood approach, which is equivalent to minimizing the Kullback–Leibler (KL) divergence DKL(pq)=p(x)log(p(x)/q(x))dx between the true density and the estimate: ˆλ()=argminρ()ADKL(fλfρ)
where A is some set of intensity functions, and fρ() is the density of a NHPP with intensity function ρ() taken with respect to the density of the unit rate Poisson process.

In order to apply the measure transport approach to intensity function estimation, we first define ˜ρ()=ρ()/μρ(X) and ˜λ(x)=λ(x)/μλ(X), so that ˜ρ() and ˜λ() are valid density functions with respect to Lebesgue measure. The KL divergence DKL(fλ||fρ) can be written in terms of process densities as follows, DKL(fλfρ)=μρ(X)μλ(X)X˜λ(x)log˜ρ(x)dxμλ(X)logμρ(X)+const.,

where “const.” captures other terms not dependent on μρ(X) or ˜ρ(). After further approximations, we obtain the following optimization problem ˆ˜λ()=argmin˜ρ()˜Ani=1log˜ρ(xi),
where now ˜A is some set of process densities.

We model the process density ˜ρ() using the transportation of probability measure approach. Specifically, we seek a diffeomorphism T:XZ, where Z need not be the same as X, such that ˜ρ(x)=η(T(x))|detT(x)|,xX,

where η() is some simple reference density on Z, and |detT()|>0.

We prove that the increasing triangular maps constructed using neural autoregressive flows satisfy a universal property in the context of probability density approximation.

Theorem 1. Let P be a non-homogeneous Poisson process with positive continuous process density ˜λ() on XRd. Suppose further that the weak (Sobolev) partial derivatives of ˜λ up to order d+1 are integrable over Rd. There exists a sequence of triangular mappings (Ti())i wherein the kth component of each map T(k)i() has the form above, and wherein the corresponding conditional networks are universal approximators (e.g., feedforward neural networks with sigmoid activation functions), such that η(Ti())det(Ti())˜λ(),

with respect to the sup norm on any compact subset of Rd.

Illustration

We apply our method for intensity function estimation to an earthquake data set comprising 1000 seismic events of body-wave magnitude (MB) over 4.0. The data set is available from the package. The events we analyze are those that occurred near Fiji from 1964 onwards. The left panel of Figure 1 shows a scatter plot of locations of the observed seismic events.

We fitted our model using a composition of five triangular maps. The estimated intensity surface and the standard error surface obtained using are shown in the middle and right panels of Figure 1, respectively. The probability that the intensity function exceeds various threshold can also be estimated using non-parametric bootstrap resampling; some examples of these exceedance probability plots are shown in Figure 2.

Figure 1. Left panel: Scatter plot of earthquake events with body-wave magnitude greater than 4.0 near Fiji since 1964. Middle panel: Estimated intensity function obtained using measure transport. Right panel: Estimated standard error of the intensity surface.

Figure 2. Left panel: Estimated exceedance probability P(λ()>1). Middle panel: Estimated exceedance probability P(λ()>5). Right panel: Estimated exceedance probability P(λ()>10).

References

Dinh, Laurent, David Krueger, and Yoshua Bengio. 2015. NICE: Non-Linear Independent Components Estimation.” In Workshop Track Proceedings of the 3rd International Conference on Learning Representations.
Dinh, Laurent, Jascha Sohl-Dickstein, and Samy Bengio. 2017. “Density Estimation Using Real NVP.” In Conference Track Proceedings of the 5th International Conference on Learning Representations.
Germain, Mathieu, Karol Gregor, Iain Murray, and Hugo Larochelle. 2015. MADE: Masked Autoencoder for Distribution Estimation.” In Proceedings of the 32nd International Conference on Machine Learning, 881–89.
Huang, Chin-Wei, David Krueger, Alexandre Lacoste, and Aaron Courville. 2018. “Neural Autoregressive Flows.” In Proceedings of the 35th International Conference on Machine Learning, 80:2078–87. Proceedings of Machine Learning Research.
Marzouk, Youssef, Tarek Moselhy, Matthew Parno, and Alessio Spantini. 2016. “Sampling via Measure Transport: An Introduction.” In Handbook of Uncertainty Quantification, 1–41.
Papamakarios, George, Theo Pavlakou, and Iain Murray. 2017. “Masked Autoregressive Flow for Density Estimation.” In Advances in Neural Information Processing Systems 30, 2338–47. http://papers.nips.cc/paper/6828-masked-autoregressive-flow-for-density-estimation.pdf.
To leave a comment for the author, please follow the link and comment on their blog: YoungStatS.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)