Illustration of Graphical Gaussian Process models to analyze highly multivariate spatial data

[This article was first published on YoungStatS, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Abundant multivariate spatial data from the natural and environmental sciences demands research on the joint distribution of multiple spatially dependent variables (Wackernagel (2013), Cressie and Wikle (2011), Banerjee and Gelfand (2014)). Here, our goal is to estimate associations over spatial locations for each variable and those among the variables.

In this document, we will present an example of a simulated multivariate spatial data and how we can use Graphical Gaussian processes (GGP) (Dey and Banerjee (2021)) to model the data. First, we’ll introduce the general multivariate spatial model, and then we will introduce a variable graph and how to simulate a Graphical Gaussian process (Matérn) for that variable graph. Next, we will lay out the estimation steps of GGP parameters and how the estimated parameters compare against the truth.

Model

Let y(s) denote a q×1 vector of spatially-indexed dependent outcomes for any location sDRd with typically d=2 or 3. On our spatial domain D, a multivariate spatial regression model provides a marginal spatial regression model for each outcome as follows: yi(s)=xi(s)Tβi+wi(s)+ϵi(s),i=1,2,,q,sD

where yi(s) is the i-th element of y(s), xi(s) is a pi×1 vector of predictors, βi is the pi×1 vector of slopes, wi(s) is a spatially correlated process and ϵi(s)indN(0,τ2i) is the random error in outcome i. We usually assume that ϵi(s) are independent across i, whereas w(s)=(w1(s),w2(s),,wq(s))T is a zero-centred multivariate Gaussian process (GP) with a zero mean and a cross-covariance function that introduces dependency across space and among the q variables. Let Si represent the collection of places where the ith variable was observed. The reference set of locations for our approach is L=iSi.

The definition of w(s) as the q×1 multivariate graphical Matérn GP (Dey and Banerjee (2021)) with respect to a decomposable variable graph GV yields the distribution of each wi. The marginal parameters for each component Matérn process wi() are denoted by {ϕii,σii|i=1.,q}.

It is sufficient to limit the intra-site covariance matrix Σ=(σij) to be of the following form to assure a valid multivariate Matérn cross-covariance function (Theorem 1 of Apanasovich and Sun (2012)):

νij=12(νii+νjj)+ΔA(1Aij) where ΔA0,A=(Aij) for all i0,Aii=1i,jcicjϕij0 is a conditionally non-negative definite matrix σij=bijΓ(12(νii+νjj+d))Γ(νij)ϕ2ΔA+νii+νjjijΓ(νij+d2) where ΔA0, and B=(bij)>0, i.e., is p.d.

This is equal to Σ=(B(γij)), where γij’s are constants collecting the components in (2) involving just νij’s and ϕij’s, and indicates the Hadamard (element-wise) product.

In the following example, we’ll assume ΔA=0,νij=νii+νjj2, ϕ2ij=ϕ2ii+ϕ2jj2. The constraints in (2) simplifies to –

σij=(σiiσjj)12ϕνiiiiϕνjjjjϕνijijΓ(νij)Γ(νii)12Γ(νij)12rij where , and R=(rij)>0, i.e., is p.d.

We also take νii=νjj=0.5 in our case and hence, we only need to estimate the marginal scale (ϕii) and variance parameters (σii) and cross-correlation parameters rij. We also assume variable graph (GV) to be decomposable.

Simulation

As a first step of simulation, we need to fix a decomposable variable graph (GV). Depending on the perfect ordering of cliques in the graph, the density function of the full process can be factorized. Using this property, we can calculate the variance-covarinace matrix of the GGP and use it to simulate observation from a multivariate GGP.

Decomposable variable graph

If GV is decomposable, it has a perfect clique sequence {K1,K2,,Kp} with separators {S2,,Sm}, and w(L)’s joint density may be factorized as follows. fM(w(L))=Πpm=1fC(wKm(L))Πpm=2fC(wSm(L)),

where Sm=Fm1Km and Fm1=K1Km1, and fA denotes the the density of a GP over L with covariance function A for A{M,C}.

Here, we assume we have 10 variables with each being observed at 250 locations uniformly chosen from the (0,1)×(0,1) square. We assume the variable graph to be as follows.

We calculate the perfect ordering of the cliques of the graph above and list the cliques and separators below. To visualize the decomposition, we also plot the junction tree (definition below) of the graph –

The junction graph G of a decomposable graph GV is a complete graph with the cliques of GV as its nodes. Every edge in the junction graph is denoted as a link, which is also the intersection of the two cliques, and can be empty. A spanning tree of a graph is defined as a subgraph comprising all the vertices of the original graph and is a tree (acyclic graph). If a spanning tree J of the junction graph of G satisfies the following property: for any two cliques C and D of the graph, every node in the unique path between C and D in the tree contains CD. Then J is called the junction tree for the graph GV. Here, the junction tree of the graph has the perfectly ordered cliques as its nodes and the separators denoted as edges.

Hence, in this case, the likelihood of the decomposable GGP would be as follows – fM(w(L))=fC(w(1,2,3)(L))fC(w(2,3,4)(L))fC(w(4,5,6)(L))fC(w(6,7,8)(L))fC(w(8,9,10)(L))fC(w(2,3)(L))fC(w(4)(L))fC(w(6)(L))fC(w(8)(L)),

Simulating latent GGP

First we fix the marginal and cross-covariance parameters of the process.

Now, the precision matrix of the GGP w(L) satisfies (Lemma 5.5 of Lauritzen (1996)) M(L,L)1=pm=1[C1[KmGL]]V×Lpm=2[C1[SmGL]]V×L,

The (6) and (4) shows that inverting the full cross-covariance matrix only requires inverting the clique and separator specific covariance matrices. Hence, the computational complexity for calculating likelihood of a multivariate GGP boils down to O(n3c3) (here, O(250327)) where c=3 is the maximum size of a clique in the perfect clique ordering of the graph. On the contrary, the likelihood of a full GGP would need O(n3q3) complexity, i.e. O(25031000) in our case.

We use the (6) to calculate the covariance of the latent GGP and we use it to simulate the latent process.

Simulating multivariate spatial outcome

Now we use (1) to simulate our multivariate outcome yi(.),i=1,,10. In order to do that, we fix some covariates xi() and simulate error process ϵi() independently from N(0,τ2ii). We take τ2ii=τ2=0.25 for all i.

For analyzing the prediction accuracy of our method, we randomly pick 20% of the locations for each outcome variable and consider them missing. We will only be working with the training set to fit the model and judge our prediction accuracy on the test set.

Now we visualize the surface of the training data by variables below.

Data analysis

The analysis of our simulated data can be broken down in the following steps.

  1. Marginal parameter estimation: We estimate the marginal scale (ϕii), variance (σ2ii) and smoothness parameters (νii) from the component Gaussian processes. We also estimate the error variance (τ2ii) for each marginal processes.

  2. Initialize Gibbs sampling: For this step, we need the following components:

    1. Processing the variable graph: We fix our variable graph for the estimation process. First we calculate cliques, separators of the graph. Then we color the nodes and edges of the graph for parallel computation purposes.
    2. Starting cross-correlation: We use the estimated marginal parameters and an initial correlation parameter to start off.
  3. Gibbs sampling: We run our Gibbs sampler in two steps –

    1. Sampling latent spatial processes
    2. Sampling latent correlations

Marginal parameter estimation

  1. We first estimate the marginal process parameters using BRISC package in R. We estimate scale (ϕii) and variance (σii) parameters. We fix the marginal smoothness (νii) and cross-smoothness (νij) parameters at 0.5.
  2. Same as the true cross-covariance, we calculate the cross-scale parameters as: ϕij=ϕ2ii+ϕ2jj2.
  3. We estimate the marginal regression coefficients (βi).
  4. We estimate the marginal error variance (τ2ii).

Initialize Gibbs sampling

  1. Process the variable graph:

First, we color the nodes of the variable graph. This will allow us to simulate the latent processes belonging to the same color in parallel.

We also construct a new graph GE(GV)=(EV,E) which denotes this graph on the set of edges EV, i.e., there is an edge ((i,j),(i,j)) in this new graph GE(GV) if {i,i,j,j} are in some clique K of GV. This allows us to facilitate parallel update of cross-correlation parameters corresponding to edges in the same color.

These two procedures are examples of chromatic Gibbs sampling.

2.Calculate initial cross-covariance matrix: Start with an initial cross-correlation matrix. We Take a convex combination: 0.5diag(q)+0.5cor(Y.train). Then we use this initial cross-correlation parameters and estimated marginal parameters to store the cross covariance matrices for cliques and separators only.

The largest matrix we need to store is of size nqc×nqc matrix where qc is the size of maximum clique or separator).

Gibbs sampling

We iteratively perform the following steps until we reach a desired number of samples (N) – 1. Sampling latent processes : Using random draws from multivariate normal distribution 2. Sampling correlations: Metropolis-hastings 3. Jumping between graphs: Reversible jump MCMC.

Sampling latent processes

To update the latent random effects w, let L={s1,,sn} and oi=diag(I(s1Si),,I(snSi)) denote the vector of missing observations for the i-th outcome. With Xi(L)=(xi(s1),,xi(sn))T, yi(L) and wi(L) defined similarly, we obtain p(wi(L)|)N(M1iμi,M1i), where Mi=1τ2idiag(oi)+KiM1{i}×L|(K{i})×LSiM1{i}×L|(S{i})×L,μi=(yi(L)xi(L)Tβi)oiτ2i+KiTi(K)w((K{i})×L)SiTi(S)w((S{i})×L),Ti(A)=M1{i}×L|(A{i})×LM{i}×L,(A{i})×LM1(A{i})×L, for A{K,S}.

%We denote Ti=LSi. % for a clique K in variable graph GV, the set – K×L=K×S, S×L=S×S, %iK={Ki}×S, i×Si=iS and i×Ti=iT, iKiT=iKT and iKiS=iK×L . Also, for a clique K in containing a variable i, let’s denote MiS.K=MiSMiS,iKTM1iKTMiKT,iS and MiT.K=MiTMiT,iK×LM1iK×LMiK×L,iT.

Sampling cross-correlation parameters

Requires only checking positive-definiteness of the clique-specific cross-covariance matrix and inverting it. The largest matrix inversion across all these updates is of the order nc×nc, corresponding to the largest clique. The largest matrix that needs storing is also of dimension nc×nc. These result in appreciable reduction of computations from any multivariate Matérn model that involves nq×nq matrices and positive-definiteness checks for q×q matrices at every iteration.

  1. For every correlation parameter corresponding to edges in the current graph, we draw a new correlation value from the proposal distribution.

  2. We accept or reject the proposal based on the Metropolis-Hastings acceptance probability.

Results

We create three plots below among true and estimated parameters for – (1) product of marginal scale and variance parameter (σii), (2) cross-correlation parameter (rij) and (3) product of cross-covariance and cross-scale parameter (σijϕij). The points on all the plots align on the y=x line thus showing the accuracy of our estimation. We also create a plot across 10 variables showing true test data values and predicted values from our model. The points align on y=x line and mean-square error varied from 0.403 to 1.064.

About the authors

  • Debangan Dey, National Institute of Mental Health.
  • Abhirup Datta, Dept. of Biostatistics, Johns Hopkins Bloomberg School of Public Health.
  • Sudipto Banerjee, Dept. of Biostatistics, UCLA Fielding School of Public Health.

Bibliography

Apanasovich, Marc G Genton, Tatiyana V, and Ying Sun. 2012. “A Valid Matern Class of Cross-Covariance Functions for Multivariate Random Fields with Any Number of Components.” Journal of the American Statistical Association 107 (497): 180–93.
Banerjee, B. P. Carlin, S., and A. E. Gelfand. 2014. Hierarchical Modeling and Analysis for Spatial Data. Chapman & Hall/CRC, Boca Raton, FL.
Cressie, Noel A. C., and Christopher K. Wikle. 2011. Statistics for Spatio-Temporal Data. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ.
Dey, Abhirup Datta, Debangan, and Sudipto Banerjee. 2021. “Graphical Gaussian Process Models for Highly Multivariate Spatial Data.” Biometrika December.
Lauritzen, Steffen L. 1996. Graphical Models. Vol. 17. Clarendon Press.
Wackernagel, Hans. 2013. Multivariate Geostatistics: An Introduction with Applications. Springer Science & Business Media.
To leave a comment for the author, please follow the link and comment on their blog: YoungStatS.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)