0

This is my first time using rjags, and I'm trying to fit some count data Y. I'm using a hierarchical mixture model as follows:

Y ~ p*Poisson(N*lambda1) + (1-p)*Poisson(N*lambda2)
lambda1 ~ Gamma(a,b)
lambda2 ~ Lognormal(c,d)
a ~ Gamma(1,1)
b ~ Gamma(1,1)
c ~ Normal(0,1)
d ~ Gamma(1,1)

Here, Y is the count data that I observe, and N is known.

I wrote up a simple rjags model that I've been playing around with. However, when testing on simple simulated data, I'm getting really bad results. Here is my code to generate the simulated data and run the model:

a <- 0.5
b <- 0.5
c <- -10
d <- 1

lambda1 <- rgamma(30,a,b)
lambda2 <- rlnorm(70,c,d)
counts <- rpois(100,1000*c(lambda1,lambda2))

model_string <- "model{
  # Likelihood 
  for (i in 1:n) {
    mu1[i] <- N*lambda1[i]
    mu2[i] <- N*lambda2[i]
    lambda1[i] ~ dgamma(a,b)
    lambda2[i] ~ dlnorm(c,d)
    m[i] ~ dcat(mprior[])
    mu[i] <- equals(m[i],1)*mu1[i] + equals(m[i],2)*mu2[i]
    Y[i] ~ dpois(mu[i])
  }
  # Prior
  mprior[1] <- 0.5
  mprior[2] <- 0.5
  a ~ dgamma(1,1)
  b ~ dgamma(1,1)
  c ~ dnorm(0,1)
  d ~ dgamma(1,1)
}"

model <- jags.model(textConnection(model_string), 
                    data = list(Y=counts,N=1000,n=100))
update(model,10000)
samp <- coda.samples(model, 
                     variable.names=c("a","b","c","d","m"), 
                     n.iter=20000)

print(colMeans(samp[[1]])[1:4])

After running this, the posterior estimates of a, b, c, d are quite poor, and the component assignments m also don't match up well with the true assignments. I also notice that plotting the chains don't look great, even when increasing the number of iterations.

Any advice? I'm not sure if I'm fitting the mixture in the best way. I'm also definitely open to changing my priors on a, b, c, d if there are other distributions that will be easier to work with.

ing
  • 21
  • 3

1 Answers1

0

In your JAGS model lambda depends on i, which probably was not your intention. The model can not estimate parameters using such a definition as basically there are too many lambdas or put another way each draw follows it's own distribution.

Maybe model should look more like

lambda1 ~ dgamma(a,b)
lambda2 ~ dlnorm(c,d)

Or if you really want to have separate lambdas, then you would need many more draws per lambda for posterior distribution to make sense. Right now you are just estimating with one data point per lambda.

Same applies to categorical distribution you use for the mixture.

hgrey
  • 3,033
  • 17
  • 21