0

I am working on modeling how wind speed affects mosquito traps efectiveness. I've been struggling with assessment of model's goodnes of fit, and a collegue recommended to do bootstrapping dividing the data on training and test. I count with a dataset of 204 rows. I used the 30% of my data for the test. My problem comes whith the test model. When I try to run it i constantly get this error:

Error in jags.model(file = model.file, data = data, inits = inits, n.chains = n.chains,  : 
  RUNTIME ERROR:
Compilation error on line 7.
Index out of range taking subset of  alpha

My actual model and settings look like this:


cat(file = "w2_model_nb.txt","
model{
#likelihood 
for (i in 1:N){
  captures[i] ~ dnegbin(p[i], r)
  p[i] <- r/(r+lambda[i])
  log(lambda[i]) <-  alpha[week_num[i]] + beta1*wind2[i] 
  captures.new[i] ~ dnegbin(p[i], r)
  
} 
#random effect
for(j in 1:M){ # Parameter (random effects) model, loop over populations
  alpha[j] ~ dnorm(mu.alpha, tau.alpha) # Mean and precision = 1/variance
}


#priors
beta1 ~ dnorm(0, 0.001)
mu.alpha~dnorm(0, 0.001)
tau.alpha ~ dgamma(1.0E-3, 1.0E-3) 
sigma.alpha <- sqrt(1/tau.alpha)
r ~ dunif(0,50)
}
")

params <- c("beta1", "alpha", "sigma.alpha", "lambda", "r", "p", "captures.new")

ni <- 10000; nt <- 1; nb <- 1000; nc <- 3

inits <- function()list(beta1=rnorm(1,0,10), tau.alpha=1, mu.alpha=rnorm(1,0,10), r = runif(1, 0, 1))

Then I did the training and test subsets:

#this formula calculate the number of test data based on the 30% use for train.
pr = round(nrow(data2)*30/100)

#This is sampling with replacement. If you don't want replacement change "replace = FALSE"
d <- as.numeric(rownames(data2)) # We choose samples based on rownames
sam <- sample (d , size = pr, replace = FALSE)

s.test <- data2[sam,] # select data for test

s.train <- data2[-sam,]# select data for train

And run the models:


win.data.train <- list(captures=s.train$captures,#number of mosquitoes per trap (lots of 0)
                      N = length(s.train$captures),#totalof days we set the traps
                      M= length(unique(s.train$week)), #number of weeks (captures are organised in 
                                                       #groups of #4 consecutives days in the same 
                                                       #week,we assume this has an effect)
                      wind2 = s.train$windvel2,        #daily measure of wind speed
                      week_num = s.train$week_num)     #unique id for each group of 4 days (week)

outw2nb.train <- jags(win.data.train, inits, params, "w2_model_nb.txt", n.chains = nc, n.thin = nt,
                     n.iter = ni, n.burnin = nb) 


win.data.test <- list(captures=s.test$captures,
                 N = length(s.test$captures),
                 M= length(unique(s.test$week)), 
                 wind2 = s.test$windvel2,
                 week_num = s.test$week_num)


outw2nb.test <- jags(win.data.test, inits, params, "w2_model_nb.txt", n.chains = nc, n.thin = nt,
                n.iter = ni, n.burnin = nb) 

The model with the whole database and the train model works (are not really good, but theey work). But the test model keeps giving the error.

At first I thought the error was caused by the sampling, given that a sample of 61 rows is too small that sometimes it does not include all the 51 weeks that are on the original dataset. However, I tried with samples that include all of them and the same error showed up.

Any advice regarding this would be welcome! Also, I am open to suggestions regarding goodness of fit asessment, too (at first what I was trying to do is sampling parameters from the posterior distributon to simulate a new dataset and compare t with the original).

Thanks!

scc27
  • 11
  • 2

0 Answers0