Extrapolation of single-year abundance-class data with trend information provided by multi-year full-count data in integrated binomial N-mix model

Question

I'm trying to integrate multiple types of abundance data in an model using jags in r. I'm struggeling with fitting a binomial N-mixture model with count as well as abundance-class using interval censoring.

My data set 1 includes count data abundances from several sites, survey rounds and multiple years. The data set 2 is comprised of aggregated data from one year (year 1, nested in the years1-5 in data set 1) in the form of abundance-class data.

A simplified data set can be simulated roughly following Ch. 10.3-5 in Marc Kéry & J. Andy Royle (2020): Applied hierarchical modeling in ecology. Modeling distribution, abundance and species richness using R and BUGS. Volume 2: Dynamic and Advanced models:

library(AHMbook)
library(jagsUI)
library(berryFunctions)

## Simulate two data sets
# Constants for simulation function
M1 <- 500           # Number of sites with abundance-class surveys
J1 <- 1             # Number of surveys in abundance-class survey
M2 <- 100           # Number of sites with full-count surveys
J2 <- 4             # Number of surveys in full-count surveys
mean.lam1 <- 6.5    # Abundance intercept
mean.lam2 <- 6      # Abundance intercept
mean.lam3 <- 5.5    # Abundance intercept
mean.lam4 <- 5      # Abundance intercept
mean.lam5 <- 4.5    # Abundance intercept
beta.lam <- 0.5     # Coefficients of a site covariate in abundance
mean.p1tot <- 1-(1-0.4)^4 # Detection intercept in abundance-class survey: Total detection probability after 4 surveys with each p=0.4
mean.p2 <- 0.4      # Detection intercept in full-count survey

# Simulate data set 1 for abundance-class surveys (surveyd in year1)
set.seed(1)
str(data1 <- simNmix(nsite = M1, nvisit = J1, mean.lam = mean.lam1,
                     mean.p = mean.p1tot, beta2.lam = beta.lam,
                     show.plot = FALSE))

# Simulate data sets 2a-e for full-count surveys, 5 data sets for 5 years (year1-5) with varying mean.lam (decreasing mean abundance over time) 
str(data2a <- simNmix(nsite = M2, nvisit = J2, mean.lam = mean.lam1,
                     mean.p = mean.p2, beta2.lam = beta.lam,
                     show.plot = FALSE))
str(data2b <- simNmix(nsite = M2, nvisit = J2, mean.lam = mean.lam2,
                      mean.p = mean.p2, beta2.lam = beta.lam,
                      show.plot = FALSE))
str(data2c <- simNmix(nsite = M2, nvisit = J2, mean.lam = mean.lam3,
                      mean.p = mean.p2, beta2.lam = beta.lam,
                      show.plot = FALSE))
str(data2d <- simNmix(nsite = M2, nvisit = J2, mean.lam = mean.lam4,
                      mean.p = mean.p2, beta2.lam = beta.lam,
                      show.plot = FALSE))
str(data2e <- simNmix(nsite = M2, nvisit = J2, mean.lam = mean.lam5,
                      mean.p = mean.p2, beta2.lam = beta.lam,
                      show.plot = FALSE))

# Get lower and upper class boundary for every binned count
breaks <- c(0, 1, 2, 3, 7, 20, 50, 150, 400) # Up to and including
Cclass <- classify(c(data1$C), method = 'custom', breaks = breaks)$index
Aclass <- matrix(Cclass, byrow = FALSE, ncol = data1$nvisit)
n <- length(Cclass)
limits <- array(NA, c(n, 2), list(1:n, c('Lower', 'Upper')))
for(i in 1:n){
  limits[i, 1:2] <- c(breaks[Cclass[i]], breaks[Cclass[i]+1])
}

# Add NAs for unsurveyed years (year2-5) to data of surveyed year (year1)
limitsNA <- array(NA, c(500, 5, 2), list(1:500,1:5,c('Lower', 'Upper'))) 
for(i in 1:500){
  limitsNA[i, 1, 1:2] <- c(limits[i,1], limits[i,2])
}

# Vectorize the environmental covariate
X22=rowMeans(array(c(data2a$site.cov[,2], data2b$site.cov[,2], data2c$site.cov[,2], data2d$site.cov[,2], data2e$site.cov[,2]), dim=c(100, 5))) #using mean, as my original data set includes a site cov that is constant across the survey years
 
# Response for interval censoring = simply a vector of ones !
y <- rep(1, 500)

# Bundle data
str(bdata <- list(y = cbind(y,y,y,y,y), M1 = nrow(Aclass), X21 = data1$site.cov[,2],
              limits = limitsNA, C2 = array(c(data2a$C, data2b$C, data2c$C, data2d$C, data2e$C), dim=c(100, 4,5)), M2 = nrow(data2$C), J2 = ncol(data2$C), T2=5,
              X22=X22, year2=as.vector(scale(c(2005:2009)))))

My goal is to use data set 1 (full-count data) to estimate a detection probability and shared covariates in the likelihood part of the model. I would like to refine the shared likelihood covariates with the larger, but coarser data set 2. I managed to do this. But I would also like to use the trend-information (included via year covariates) provided by data set 1 to extrapolate abundances of data set 2. In the end I would like to obtain N-estimates for the sites included in data set 2 for the missing years 2-5 only included in data set 1.

    cat(file = "model2.txt", "
model {
  # Priors for the parameters shared in both data sets
  alpha.lam ~ dnorm(0, 0.01)      # Abundance intercept
  beta.lam ~ dnorm(0, 0.1)        # Abundance slope on site cov X2
  beta.lam1 ~ dnorm(0, 0.1)        # Abundance slope on year
  for (i in 1:M2){
      for (j in 1:J2){
      p2[i,j] ~ dunif(0,1)
      }}

  # Likelihood
  # Model for latent abundance in both data sets
  # Note same parameter names means parameters are identical
  for (i in 1:M1){                # Data set 1
    for(t in 1:T2){
    N1[i,t] ~ dpois(lambda1[i,t])
    log(lambda1[i,t]) <- alpha.lam + beta.lam * X21[i] + beta.lam1 * year2[t] #more complex covariates included in original model...
  }}
  for (i in 1:M2){                # Data set 2
    for (t in 1:T2){
    N2[i,t] ~ dpois(lambda2[i,t])
    log(lambda2[i,t]) <- alpha.lam + beta.lam * X22[i] + beta.lam1 * year2[t]
  }}

  # Observation model for observed counts and for detection
  # Observation model for data set 2 (full counts)
  for (i in 1:M2){ # Data set 2
    for(j in 1:J2){
      for(t in 1:T2){
      C2[i,j,t] ~ dbin(p2[i,j], N2[i,t])
    }}}
  for (i in 1:M2){
    ptot[i]<- 1-(1-p2[i,1])*(1-p2[i,2])*(1-p2[i,3])*(1-p2[i,4]) #Total observ prob after 4 survey dates per site
    }
  ptotmean <- mean(ptot) #Mean total detection probability after four surveys
  # Observation model for data set 1 (binned counts)
  for (i in 1:M1){
  for(t in 1:T2){
    y[i,t] ~ dinterval(C1[i,t], limits[i,t,])       # interval censoring
    C1[i,t] ~ dbin(ptotmean, N1[i,t])             # using average detection probability from oberservation model 2
  }}

  # Derived quantities
  for (t in 1:T2){
  Ntotal1[t] <- sum(N1[,t])            # Total abundance in data set 1
  Ntotal2[t] <- sum(N2[,t])            # Total abundance in data set 2
  Nmean1[t] <- mean(N2[,t])
  Nmean2[t] <- mean(N2[,t])
  }
  for (j in 1:J2){
      pmean[j] <- mean(p2[,j])
    }
}
")

# Initial values
Nst1 <- limitsNA[,,2]+5
Nst1[is.na(Nst1)] <- round(mean(Nst1, na.rm=T))+5
Cst <- limitsNA[,,1]+1
Cst[is.na(Cst)] <- round(mean(Cst, na.rm=T))+1
Nst2 <- apply(bdata$C2, c(1,3), max)+5
inits <- function() list(N1 = Nst1, C1 = Cst, N2 = Nst2)

# Parameters monitored
params <- c("mean.lam", "beta.lam", "beta.lam1",
            "Ntotal1", "Ntotal2", "GTotalN", "N1", "C1", "N2", "Ntotal1", "Ntotal2", "Nmean1", "Nmean2", "pmean",  "ptot", "ptotmean")

# MCMC settings
# na <- 1000 ; nc <- 3 ; ni <- 50000 ; nb <- 10000 ; nt <- 40
na <- 1000 ; nc <- 3 ; ni <- 5000 ; nb <- 1000 ; nt <- 4  # ~~~~ for testing, 2 mins

# Call JAGS (ART 22 min), gauge convergence and summarize posteriors
out2 <- jags(bdata, inits, params, "model2.txt", n.adapt = na, n.chains = nc,
             n.thin = nt, n.iter = ni, n.burnin = nb, parallel = TRUE)

R returns the following error:

Error in checkForRemoteErrors(val) : 
  3 nodes produced errors; first error: RUNTIME ERROR:
Unable to resolve the following parameters:
limits[1,2,1:2] (line 40)
limits[1,3,1:2] (line 40)
limits[1,4,1:2] (line 40)
limits[1,5,1:2] (line 40)
[...]

Dinterval seems to be missing limits. So far I could not find a soulution to my problem, as this application of dinterval is not very common. Any thoughts on how I could extrapolate the data abundace-class estimates to the timespan of data set 1, while sharing covariates between the model parts?

Another minor problem I encountered: Unlike the simulated data, my real abundance-class data also includes absences. I failed to include these absences and therefore excluded them from my data set for now. I suppose dinterval cannot handle non-abundance class data with the limits (0,0). I would appreciate any ideas on how to include these sites in my model.

I'd be happy about any helpful suggestions! Thanks, Jo

(This post was also posted on sourceforge: https://sourceforge.net/p/mcmc-jags/discussion/610037/thread/224919442a/)

Johanna, the dimension of `limits` in you jags code is 3, while you supplied a 2d array. That's what the error code is saying, basically. As for your other questions, they do not have a definite answer, and you should ask your advisor instead. — hyena, Jan 06 '22 at 06:51
Hi Hyena, thank you for your reply! You are right, in the code above I included "limits" (2d) instead of "limitsNA" (3d) in the jags data. However, this was only a mistake in the posted question (ooops!), which I corrected now, not in the code I ran in r. The identical error message also pops up, when when a 3d array is supplied. — Johanna, Jan 06 '22 at 09:45
Plus the real data I'm working with is also a 3d array and I get the same error. I reckon dinterval cannot work with NAs. — Johanna, Jan 06 '22 at 10:02
I don't think jags have a way of handling NAs in data, `limits` in your case. If you can provide to distribution to those `limits`, it would be OK to have `NA`s there. — hyena, Jan 06 '22 at 16:58

Extrapolation of single-year abundance-class data with trend information provided by multi-year full-count data in integrated binomial N-mix model

0 Answers0