I am working on modeling how wind speed affects mosquito traps efectiveness. I've been struggling with assessment of model's goodnes of fit, and a collegue recommended to do bootstrapping dividing the data on training and test. I count with a dataset of 204 rows. I used the 30% of my data for the test. My problem comes whith the test model. When I try to run it i constantly get this error:
Error in jags.model(file = model.file, data = data, inits = inits, n.chains = n.chains, :
RUNTIME ERROR:
Compilation error on line 7.
Index out of range taking subset of alpha
My actual model and settings look like this:
cat(file = "w2_model_nb.txt","
model{
#likelihood
for (i in 1:N){
captures[i] ~ dnegbin(p[i], r)
p[i] <- r/(r+lambda[i])
log(lambda[i]) <- alpha[week_num[i]] + beta1*wind2[i]
captures.new[i] ~ dnegbin(p[i], r)
}
#random effect
for(j in 1:M){ # Parameter (random effects) model, loop over populations
alpha[j] ~ dnorm(mu.alpha, tau.alpha) # Mean and precision = 1/variance
}
#priors
beta1 ~ dnorm(0, 0.001)
mu.alpha~dnorm(0, 0.001)
tau.alpha ~ dgamma(1.0E-3, 1.0E-3)
sigma.alpha <- sqrt(1/tau.alpha)
r ~ dunif(0,50)
}
")
params <- c("beta1", "alpha", "sigma.alpha", "lambda", "r", "p", "captures.new")
ni <- 10000; nt <- 1; nb <- 1000; nc <- 3
inits <- function()list(beta1=rnorm(1,0,10), tau.alpha=1, mu.alpha=rnorm(1,0,10), r = runif(1, 0, 1))
Then I did the training and test subsets:
#this formula calculate the number of test data based on the 30% use for train.
pr = round(nrow(data2)*30/100)
#This is sampling with replacement. If you don't want replacement change "replace = FALSE"
d <- as.numeric(rownames(data2)) # We choose samples based on rownames
sam <- sample (d , size = pr, replace = FALSE)
s.test <- data2[sam,] # select data for test
s.train <- data2[-sam,]# select data for train
And run the models:
win.data.train <- list(captures=s.train$captures,#number of mosquitoes per trap (lots of 0)
N = length(s.train$captures),#totalof days we set the traps
M= length(unique(s.train$week)), #number of weeks (captures are organised in
#groups of #4 consecutives days in the same
#week,we assume this has an effect)
wind2 = s.train$windvel2, #daily measure of wind speed
week_num = s.train$week_num) #unique id for each group of 4 days (week)
outw2nb.train <- jags(win.data.train, inits, params, "w2_model_nb.txt", n.chains = nc, n.thin = nt,
n.iter = ni, n.burnin = nb)
win.data.test <- list(captures=s.test$captures,
N = length(s.test$captures),
M= length(unique(s.test$week)),
wind2 = s.test$windvel2,
week_num = s.test$week_num)
outw2nb.test <- jags(win.data.test, inits, params, "w2_model_nb.txt", n.chains = nc, n.thin = nt,
n.iter = ni, n.burnin = nb)
The model with the whole database and the train model works (are not really good, but theey work). But the test model keeps giving the error.
At first I thought the error was caused by the sampling, given that a sample of 61 rows is too small that sometimes it does not include all the 51 weeks that are on the original dataset. However, I tried with samples that include all of them and the same error showed up.
Any advice regarding this would be welcome! Also, I am open to suggestions regarding goodness of fit asessment, too (at first what I was trying to do is sampling parameters from the posterior distributon to simulate a new dataset and compare t with the original).
Thanks!