Non-stationary wrapped Gaussian spatial response model

[This article was first published on YoungStatS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Background

Circular data, i.e., data defined on the unit circle, can be found in many areas of science. The unique nature of these data means that conventional methods for non-circular data are not valid for these. At the same time, advances in geographical information and global positioning systems have generated large amounts of spatial data and, consequently, have increased the need for spatial models that accurately describe it. An example of circular, spatial data are the average wind directions in Germany from April 1 to 10, 2019, plotted in space (left) and on a circular histogram (right):

It was Jona-Lasinio, Gelfand, and Jona-Lasinio (2012) and Wang and Gelfand (2014) that first brought the wrapped and projected Gaussian distributions to space and space-time settings. Nonetheless, today,circular data models still need to catch up on the latest advances in spatial statistics. In particular, models developed so far assume stationarity in the mean and the covariance of the response modeled; i.e., these are constant in space. Our work in Marques, Kneib, and Klein (2022) aims to bridge this gap and allow for non-stationarity in the mean and covariance of distributions for circular responses defined on spatial domains.

The response distribution

To construct distributions for circular data, we can rely on the construction principle of wrapping the distribution of a random variable YR defined on the real line around the unit circle to make its domain adhere to the interval [0,2π]. The result is a circular – or wrapped – random variable X[0,2π). Then, the unwrapped variable Y can be expressed as Y=X+2πK

where KZ measures the number of “turns” Y has be wrapped around the unit circle.

Let s denote a spatial index variable within a spatial domain SR2. Jona-Lasinio, Gelfand, and Jona-Lasinio (2012) first showed that this wrapping approach can be extended to the context of spatial data. We extend their framework and consider X(si)=β0+z(si)β+γ(si,zκ(si))+2πK(si)+εi,

where i=1,,n, γ(si) is a zero-mean Gaussian random field (GRF) and εiN(0,σ2ε) are the i.i.d. error terms, with the following novelties:

  1. The mean depends on z(si)β, i.e. it is non-stationary,

  2. The covariance depends on zκ(si), i.e. it is non-stationary,

  3. We include small-scale noise εis.

We focus on 1. and 2.

1. Non-stationarity in the mean

Linear covariates included in the mean of a circular response might induce a circular likelihood for the regression coefficients that has infinitely many maxima. The typical solution is to use a link function for fixed effects with co-domain defined in R and of length equal to the circular variable’s period of 2π, e.g. the inverse tangent link function.

We argue that:

  1. The winding number K(s) can govern the behavior of the combination of GRF and fixed effects such that no link function is needed,

  2. We use prior information that shrinks towards a stationary mean and enforces the length of the fixed effects to be approximately 2π.

Concretely, β0N(0,10) represents a diffuse prior for the intercept and we let β|ξ2N(0,ξ2I). One can show that if the base model is such that ξ20 and the prior is constructed according to the PC-prior principles (Simpson et al. 2017), then ξ2 has a Weibull prior with shape 12 and scale λ, i.e., ξ2Weibull(12,λ) (Klein and Kneib 2016).

For α(0,1), λ is chosen such that P(maxsSz(s)β∣≤π)1α,

i.e., the probability that the maximum norm of the non-stationary effect in the mean of the response is smaller than π is larger than 1α. The value π in the expression is close to the recommendation of 3 in Klein and Kneib (2016).

2. Non-stationarity in the covariance

The stochastic partial differential equation (SPDE) approach (see Lindgren, Rue, and Lindström 2011) allows us to model non-stationarity in the covariance of the GRF in a computationally efficient way. Consider the parameters of the Matérn covariance with smoothness 1 and κ and τ controlling the range and the marginal variance of the GRF, respectively.

We let log(τ)=θτ0 and log(κ(s))=θκ0+zκ(s)θκz,

where uniform priors for θτ0 and θκ0 guarantee that the marginal variance is in [0.01,(2π)2] and the spatial range is in [0.01,1] for S[0,1]×[0,1]. Once again, we use a simulation-based method to guarantee

P(maxsSzκ(s)θκz∣≤c)1ακ,

such that c guarantees that the spatial range satisfies ρ(s)[0.01,1], with typically c=3 (see Klein and Kneib 2016). The resulting prior structure shrinks towards a model with stationary covariance.

More concretely, we have θτ0U(10,3), θκ0U(1,6) and θκz|ζ2N(0,ζ2I), with ζ2Weibull(12,λκ).

The full-hierarchical model

The SPDE is typically used in combination with integrated nested Laplace approximations (INLA). However, INLA does not perform well when combined with shrinkage priors for non-stationarity in the covariance as defined here. Thus, we use Markov Chain Monte Carlo with full-hierarchical model:

X(s)=η(s)+2πK(s)+ε,=β0+z(s)β+γ(s,zκ(s))+2πK(s)+ε,β0N(0,10), β|ξ2N(0,ξ2I), γ|θN(0,Q1(θ))θτ0U(10,3), θκ0U(1,6), θκz|ζ2N(0,ζ2I)ξ2PC(π,α), ζ2PC(3,ακ)ε|σ2εN(0,σ2εI), σ2εIG(0.001,0.001)p(K(s))={13,if K(s){1,0,1}0,otherwise

Outcome

One can use our model to estimate wind directions in the dataset above and test the model’s performance. We randomly select a holdout set consisting of 20% of the locations and use the remaining 80% as training data. The covariates considered for z(s) are maximum wind speed, average air humidity, and temperature at 10 meters height. For zκ(s) we consider altitude and latitude.

The figure below shows the posterior predictive mean (left) and standard deviation (right) for each test location for a fully stationary, i.e., stationary mean and covariance and fully non-stationary models. The black arrows represent the true wind directions (left) and the corresponding standard deviation of zero (right). The figure shows that the fully non-stationary model reaches smaller bias than the fully stationary model throughout the whole domain. In the north where this bias is close to zero, our model also reaches lower uncertainty.

References

Jona-Lasinio, Giovanna, Alan Gelfand, and Mattia Jona-Lasinio. 2012. Spatial analysis of wave direction data using wrapped Gaussian processes.” The Annals of Applied Statistics 6 (4): 1478–98.
Klein, Nadja, and Thomas Kneib. 2016. “Scale-Dependent Priors for Variance Parameters in Structured Additive Distributional Regression.” Bayesian Analysis 11 (4): 1071–1106.
Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (4): 423–98.
Marques, Isa, Thomas Kneib, and Nadja Klein. 2022. “A Non-Stationary Model for Spatially Dependent Circular Response Data Based on Wrapped Gaussian Processes.” Statistics and Computing 32 (5): 73.
Simpson, Daniel, Håvard Rue, Andrea Riebler, Thiago G Martins, Sigrunn H Sørbye, et al. 2017. “Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors.” Statistical Science 32 (1): 1–28.
Wang, Fangpo, and Alan E Gelfand. 2014. Modeling space and space-time directional data using projected Gaussian processes.” Journal of the American Statistical Association 109 (508): 1565–80.
To leave a comment for the author, please follow the link and comment on their blog: YoungStatS.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)