Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This post is intended to provide a simple example of how to construct and make inferences on a multi-species multi-year occupancy model using R, JAGS, and the ‘rjags’ package. This is not intended to be a standalone tutorial on dynamic community occupancy modeling. Useful primary literature references include MacKenzie et al. (2002), Kery and Royle (2007), Royle and Kery (2007), Russell et al. (2009), and Dorazio et al. (2010). Royle and Dorazio’s Heirarchichal Modeling and Inference in Ecology also provides a clear explanation of simple one species occupancy models, multispecies occupancy models, and dynamic (multiyear) occupancy models, among other things. There’s also a wealth of code provided here by Elise Zipkin, J. Andrew Royle, and others.
Before getting started, we can define two convenience functions:
1 2 3 4 5 6 7 | logit <- function(x) { log(x/(1 - x)) } antilogit <- function(x) { exp(x)/(1 + exp(x)) } |
Then, initializing the number of sites, species, years, and repeat surveys (i.e. surveys within years, where the occupancy status of a site is assumed to be constant),
1 2 3 4 | nsite <- 150 nspec <- 6 nyear <- 4 nrep <- 3 |
we can begin to consider occupancy. We’re interested in making inferences about the rates of colonization and population persistence for each species in a community, while estimating and accounting for imperfect detection.
Occupancy status at site , by species , in year is represented by . For occupied sites ; for unoccupied sites . However, is incompletely observed: it is possible that a species is present at a site in some year () but species was never seen at at site in year across all repeat surveys because of imperfect detection. These observations are represented by . Here we assume that there are no “false positive” observations. In other words, if , then . If a site is occupied, the probability that is represented as a Bernoulli trial with probability of detection , such that
The occupancy status of species at site in year is modeled as a Markov Bernoulli trial. In other words whether a species is present at a site in year is influenced by whether it was present at year .
where for
and in year one
where the occupancy status in year 0, , and . and are parameters that control the probabilities of colonization and persistence. If a site was unoccupied by species in a previous year , then the probability of colonization is given by the antilogit of . If a site was previously occupied , the probability of population persistence is given by the anitlogit of . We assume that the distributions of species specific parameters are defined by community level hyperparameters such that and . We can generate occupancy data as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | # community level hyperparameters p_beta = 0.7 mubeta <- logit(p_beta) sdbeta <- 2 p_rho <- 0.8 murho <- logit(p_rho) sdrho <- 1 # species specific random effects set.seed(1) # for reproducibility beta <- rnorm(nspec, mubeta, sdbeta) set.seed(1008) rho <- rnorm(nspec, murho, sdrho) # initial occupancy states set.seed(237) rho0 <- runif(nspec, 0, 1) z0 <- array(dim = c(nsite, nspec)) for (i in 1:nspec) { z0[, i] <- rbinom(nsite, 1, rho0[i]) } # subsequent occupancy z <- array(dim = c(nsite, nspec, nyear)) lpsi <- array(dim = c(nsite, nspec, nyear)) psi <- array(dim = c(nsite, nspec, nyear)) for (j in 1:nsite) { for (i in 1:nspec) { for (t in 1:nyear) { if (t == 1) { lpsi[j, i, t] <- beta[i] + rho[i] * z0[j, i] psi[j, i, t] <- antilogit(lpsi[j, i, t]) z[j, i, t] <- rbinom(1, 1, psi[j, i, t]) } else { lpsi[j, i, t] <- beta[i] + rho[i] * z[j, i, t - 1] psi[j, i, t] <- antilogit(lpsi[j, i, t]) z[j, i, t] <- rbinom(1, 1, psi[j, i, t]) } } } } |
For simplicity, we’ll assume that there are no differences in species detectability among sites, years, or repeat surveys, but that detectability varies among species. We’ll again use hyperparameters to specify a distribution of detection probabilities in our community, such that $logit(p_i) \sim Normal(\mu_p, \sigma_p^2)$.
1 2 3 4 5 6 | p_p <- 0.7 mup <- logit(p_p) sdp <- 1.5 set.seed(222) lp <- rnorm(nspec, mup, sdp) p <- antilogit(lp) |
We can now generate our observations based on occupancy states and detection probabilities. Although this could be vectorized for speed, let’s stick with nested for loops in the interest of clarity.
1 2 3 4 5 6 7 8 9 10 | x <- array(dim = c(nsite, nspec, nyear, nrep)) for (j in 1:nsite) { for (i in 1:nspec) { for (t in 1:nyear) { for (k in 1:nrep) { x[j, i, t, k] <- rbinom(1, 1, p[i] * z[j, i, t]) } } } } |
Now that we’ve collected some data, we can specify our model:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | cat(" model{ #### priors # beta hyperparameters p_beta ~ dbeta(1, 1) mubeta <- log(p_beta / (1 - p_beta)) sigmabeta ~ dunif(0, 10) taubeta <- (1 / (sigmabeta * sigmabeta)) # rho hyperparameters p_rho ~ dbeta(1, 1) murho <- log(p_rho / (1 - p_rho)) sigmarho~dunif(0,10) taurho<-1/(sigmarho*sigmarho) # p hyperparameters p_p ~ dbeta(1, 1) mup <- log(p_p / (1 - p_p)) sigmap ~ dunif(0,10) taup <- (1 / (sigmap * sigmap)) #### occupancy model # species specific random effects for (i in 1:(nspec)) { rho0[i] ~ dbeta(1, 1) beta[i] ~ dnorm(mubeta, taubeta) rho[i] ~ dnorm(murho, taurho) } # occupancy states for (j in 1:nsite) { for (i in 1:nspec) { z0[j, i] ~ dbern(rho0[i]) logit(psi[j, i, 1]) <- beta[i] + rho[i] * z0[j, i] z[j, i, 1] ~ dbern(psi[j, i, 1]) for (t in 2:nyear) { logit(psi[j, i, t]) <- beta[i] + rho[i] * z[j, i, t-1] z[j, i, t] ~ dbern(psi[j, i, t]) } } } #### detection model for(i in 1:nspec){ lp[i] ~ dnorm(mup, taup) p[i] <- (exp(lp[i])) / (1 + exp(lp[i])) } #### observation model for (j in 1:nsite){ for (i in 1:nspec){ for (t in 1:nyear){ mu[j, i, t] <- z[j, i, t] * p[i] for (k in 1:nrep){ x[j, i, t, k] ~ dbern(mu[j, i, t]) } } } } } ", fill=TRUE, file="com_occ.txt") |
Next, bundle up the data.
1 | data <- list(x = x, nrep = nrep, nsite = nsite, nspec = nspec, nyear = nyear) |
Provide initial values.
1 2 3 4 5 6 7 8 9 10 11 12 13 | zinit <- array(dim = c(nsite, nspec, nyear)) for (j in 1:nsite) { for (i in 1:nspec) { for (t in 1:nyear) { zinit[j, i, t] <- max(x[j, i, t, ]) } } } inits <- function() { list(p_beta = runif(1, 0, 1), p_rho = runif(1, 0, 1), sigmarho = runif(1, 0, 1), sigmap = runif(1, 0, 10), sigmabeta = runif(1, 0, 10), z = zinit) } |
As a side note, it is helpful in JAGS to provide initial values for the incompletely observed occupancy state $z$ that are consistent with observed presences, as provided in this example with zinit
. In other words if $x(j, i, t, k)=1$, provide an intial value of $1$ for $z(j, i, t)$. Unlike WinBUGS and OpenBUGS, if you do not do this, you’ll often (but not always) encounter an error message such as:
1 2 3 | # Error in jags.model(file = 'com_occ.txt', data = data, n.chains = 3) : # Error in node x[1,1,2,3] Observed node inconsistent with unobserved # parents at initialization |
Now we’re ready to monitor and make inferences about some parameters of interest using JAGS.
1 2 3 4 | params <- c("lp", "beta", "rho") require(rjags) ocmod <- jags.model(file = "com_occ.txt", inits = inits, data = data, n.chains = 3) |
1 2 3 4 | nburn <- 2000 update(ocmod, n.iter = nburn) out <- coda.samples(ocmod, n.iter = 17000, variable.names = params) summary(out) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | ## ## Iterations = 3001:20000 ## Thinning interval = 1 ## Number of chains = 3 ## Sample size per chain = 17000 ## ## 1. Empirical mean and standard deviation for each variable, ## plus standard error of the mean: ## ## Mean SD Naive SE Time-series SE ## beta[1] -0.4539 0.1629 0.000721 0.002140 ## beta[2] 0.9986 0.4274 0.001893 0.011038 ## beta[3] -0.9036 0.1157 0.000513 0.001006 ## beta[4] 4.7439 2.5492 0.011288 0.050756 ## beta[5] 1.5512 0.3251 0.001440 0.007095 ## beta[6] -0.8833 0.3021 0.001338 0.005858 ## lp[1] 3.0660 0.1400 0.000620 0.000782 ## lp[2] 0.8533 0.0580 0.000257 0.000409 ## lp[3] 2.9045 0.2007 0.000889 0.001160 ## lp[4] 0.3773 0.0491 0.000217 0.000315 ## lp[5] 1.2224 0.0635 0.000281 0.000407 ## lp[6] 0.4881 0.0615 0.000272 0.000551 ## rho[1] 1.8640 0.2200 0.000974 0.002691 ## rho[2] 2.6201 0.6009 0.002661 0.011576 ## rho[3] -0.0766 0.2229 0.000987 0.002010 ## rho[4] 2.6567 2.2459 0.009945 0.035121 ## rho[5] 1.1056 0.4091 0.001812 0.007679 ## rho[6] 3.4425 0.4591 0.002033 0.009132 ## ## 2. Quantiles for each variable: ## ## 2.5% 25% 50% 75% 97.5% ## beta[1] -0.777 -0.564 -0.4515 -0.343 -0.139 ## beta[2] 0.168 0.707 0.9975 1.288 1.833 ## beta[3] -1.132 -0.981 -0.9028 -0.826 -0.679 ## beta[4] 1.117 3.206 4.3157 5.732 11.010 ## beta[5] 0.896 1.338 1.5590 1.770 2.174 ## beta[6] -1.509 -1.077 -0.8699 -0.675 -0.327 ## lp[1] 2.803 2.970 3.0630 3.159 3.348 ## lp[2] 0.740 0.814 0.8535 0.892 0.967 ## lp[3] 2.528 2.766 2.8992 3.035 3.316 ## lp[4] 0.282 0.344 0.3771 0.410 0.474 ## lp[5] 1.100 1.179 1.2216 1.265 1.348 ## lp[6] 0.369 0.446 0.4877 0.529 0.610 ## rho[1] 1.439 1.714 1.8611 2.011 2.301 ## rho[2] 1.504 2.205 2.5972 3.013 3.843 ## rho[3] -0.516 -0.226 -0.0747 0.074 0.359 ## rho[4] -0.928 1.270 2.3693 3.688 8.060 ## rho[5] 0.319 0.831 1.1003 1.380 1.920 ## rho[6] 2.601 3.123 3.4209 3.741 4.390 |
1 | plot(out) |
At this point, you’ll want to run through the usual MCMC diagnostics to check for convergence and adjust the burn-in or number of iterations accordingly. Once satisfied, we can check to see how well our model performed based on our known parameter values.
1 2 3 | require(mcmcplots) caterplot(out, "beta", style = "plain") caterpoints(beta) |
1 2 | caterplot(out, "lp", style = "plain") caterpoints(lp) |
1 2 | caterplot(out, "rho", style = "plain") caterpoints(rho) |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.