Part 3: Spatial analysis of geotagged data
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Part 3: Spatial analysis of geotagged data
See the other parts in this series of blog posts.
In parts 1 and 2 we extracted spatial coordinates from our photos and then made an interactive web map that included data associate with those photos. Here I describe how we can build a spatial statistical model to interpolate to unmeasured locations.
Here we will build an interactive map showing model results from an interpolation of oyster presence/absence.
First up we need to load some packages:
library(readr)
library(INLA)
library(sp)
library(raster)
library(rgeos)
The INLA
package is crucial here. We will be
using INLA
to perform Bayesian Kriging. In non-technical terms,
Bayesian kriging lets us fill in the gaps between observations of oyster
counts by interpolating from nearby points.
Getting the data in the right form
Before we begin, we need to load in the data, and transform it into a spatial data frame. In particular, it is important to transform the lon-lats into UTM coordinates. UTM coordinates allow us to measure distances between sample sites in metres:
dat <- read_csv('Oysters_merged.csv')
n <- nrow(dat)
utmproj <- "+proj=utm +zone=10 +north +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"
spdat <- dat
coordinates(spdat) <- ~ GPSLongitude + GPSLatitude
proj4string(spdat) <- "+init=epsg:4326"
spdf <- spTransform(spdat, CRS(utmproj))
plot(spdf)
You should get a plot like that pictured just of the sample sites.
Creating an INLA “Mesh”
Now we can start building the specialised data-structures we need for
the INLA
package. First up, we have to build what’s called a mesh.
This is a Delaunay
triangulation
built around our data points. It accounts for the non-regular spatial
structuring of the sampling (I wandered around on the rocky shore
throwing the quadrat at intervals).
Below we provide two parameters to max.edge
, this will enable us to
buffer the edges to obtain a slightly large spatial domain.
max.edge.length <- c(25, 40)
loc1 <- as.matrix(coordinates(spdf))
mesh <- inla.mesh.2d(loc=loc1, max.edge = max.edge.length, offset=1, cutoff = 5)
plot(mesh)
plot(spdf, add = T, col = 'red')
You can play with the edge length, offset and cutoff parameters to vary
how the triangulations turns out. You can get more guidance at the
INLA
page and in this
tutorial.
An important step is to check that our maximum edge lengths are less than the estimated range (otherwise it is pointless including a spatial effect!). We will check the range size below, once we have fitted the model.
You might like to think of a-prior reasons for modelling a certain range, for instance, what is a reasonable spatial scale for ‘clumping’ of oyster patches?
Build INLA stack
Now we need to implement a few more steps to build up an appropriate
‘stack’ of data for INLA
. Note that I have transformed oyster count to
presence - absence. We will build a binomial model and just predict
whether any given quadrat had oysters or not.
A.data <- inla.spde.make.A(mesh, loc1)
spde <- inla.spde2.matern(mesh, alpha = 2)
spdf$presabs <- ifelse(spdf$oysters_live>0, 1, 0)
stk <- inla.stack(data=list(y=spdf$presabs), A=list(A.data, 1),
effects=list(s=1:mesh$n, intercept=rep(1, n)),
remove.unused = FALSE, tag = "est")
Note that here we choose alpha = 2
. alpha
is a smoothness parameter
that must be 0 <= alpha
<= 2, higher values will give smoother
spatial interpolation. Also note that we have to specify ‘data’ for the
intercept (just a vector of 1
).
The tag = "est"
argument just gives a name tag to the fitted model. In
other applications you might want to join the estimation stack with a
prediction stack (just build two stacks, one where the response is NA
and then join them by applying inla.stack
to both).
Model
Now, after that considerable amount of prepartion, we get to fitting the
model. We use standard R syntax for the formula, with the addition
of the f()
term to specify a function.
formula <- y ~ 0 + intercept + f(s, model=spde)
mod <- inla(formula, data=inla.stack.data(stk),
control.predictor=list(A = inla.stack.A(stk)),
family = 'binomial')
We can extract summary parameters using summary
. Here we are
interested in the random effects, it is good practice to call
inla.hyperpar
before we inspect random effects:
hyper <- inla.hyperpar(mod)
summary(hyper)
Checking the range parameter
We should also check that our triangle edge length is less than the estimated spatial range. We can do that by making a call to the model object, asking for the estimated parameters for the random field
rf <- inla.spde.result(inla = mod, name = "s",
spde = spde, do.transf = TRUE)
plot(rf$marginals.range[[1]], type = "l",
xlab = "Range (m)", ylab = "Density",
xlim = c(0, 1000))
abline(v = max.edge.length[2], col = 'red')
The red line shows our edge length parameter, which is well less than the model estimate for the spatial range.
Model predictions
Now we have a fitted model, we can extracted predicted probabilities of oyster presence and map them. First transform the predictions from the logit scale back to probabilities:
ypred <- exp(mod$summary.random$s$mean)/(1 + exp(mod$summary.random$s$mean))
Then we can turn our predictions into a spatial points, that can be be mapped onto a raster for plotting:
extent <- extent(spdf)
xdims <- 100; ydims <- 200
xlim <- c(extent[1], extent[2]); ylim = c(extent[3], extent[4]);
proj <- inla.mesh.projector(mesh, xlim = xlim, ylim = ylim, dims=c(xdims, ydims))
field.proj <- inla.mesh.project(proj, ypred)
datpred <- data.frame(x = rep(proj$x, ydims), y = rep(proj$y, each = xdims), pred = as.numeric(field.proj))
coordinates(datpred) <- ~x + y
proj4string(datpred) <- utmproj
dat3 <- spTransform(datpred, crs("+init=epsg:4326"))
r <- raster(extent(dat3), ncols = xdims, nrows= ydims, crs = crs(spdat))
icell <- cellFromXY(r, dat3)
r[icell] <- as.numeric(dat3$pred)
And finally the plot:
cols <- RColorBrewer::brewer.pal(9, "Purples")
plot(r, col = cols)
points(spdf)
plot(spdat, add = TRUE)
So it looks like oysters are more common in the southern portion of the survey area. But note that our predictions extent a fair way from the points. We might want to redo these with a more constrained spatial region.
Finally, we could map this onto a satellite image to get a better view
of what is going on. I am just using the same leaflet
code as
before:
library(leaflet)
brks <- seq(0,1, by = 0.1)
ncol <- length(brks)
oystercols <- RColorBrewer::brewer.pal(min(ncol, 9), 'Reds')
pal <- colorBin(oystercols, dat$oysters_live, bins = brks)
x1 <- -124.5997
y1 <- 49.52848
leaflet() %>%
addProviderTiles("Esri.WorldImagery") %>%
setView(lng = x1, lat = y1, zoom = 16) %>%
addRasterImage(r, opacity = 0.8, color = pal) %>%
addCircleMarkers(lng = dat$GPSLongitude, lat = spdat$GPSLatitude,
radius = 4) %>%
addLegend("topright", pal = pal,
values = brks,
title = "Chance of live oysters <a href = 'https://en.wikipedia.org/wiki/Pacific_oyster' target = '_blank'> (Crassostrea gigas)</a>",
opacity = 1)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.