1

I'm learning Stan now and wanted to implement a simple mixture model.

In the reference manual (stan-reference-2.14.0) there is a solution already:

data {
  int<lower=1> K; // number of mixture components
  int<lower=1> N; // number of data points
  real y[N]; // observations
}
parameters {
  simplex[K] theta; // mixing proportions
  real mu[K]; // locations of mixture components
  real<lower=0> sigma[K]; // scales of mixture components
}
model {
  real ps[K]; // temp for log component densities
  sigma ~ cauchy(0, 2.5);
  mu ~ normal(0, 10);
  for (n in 1:N) {
    for (k in 1:K) {
      ps[k] = log(theta[k])
      + normal_lpdf(y[n] | mu[k], sigma[k]);
    }
    target += log_sum_exp(ps);
  }
}

The next page describes that vectorization of the outer loop is not possible. However, I was wondering if the parallization of the inner loop still is.

And so I tried the following model:

data {
  int<lower=1> K; // number of mixture components
  int<lower=1> N; // number of data points
  real y[N]; // observations
}

parameters {
  simplex[K] theta; // mixing proportions
  vector[K] mu; // locations of mixture components
  vector<lower=0>[K] sigma; // scales of mixture components
}

model {
  vector[K] ps;//[K]; // temp for log component densities
  vector[K] ppt;
  sigma ~ cauchy(0, 2.5);
  mu ~ normal(0, 10);
  for (n in 1:N) {
    ppt = log(theta);
    /*
    for (k in 1:K) {
      ps[k] = ppt[k] + //log(theta[k])
        normal_lpdf(y[n] | mu[k], sigma[k]);
    }
    */
    ps = ppt + normal_lpdf(y[n] | mu, sigma);
    target += log_sum_exp(ps);
  }
}

... and this model make wrong estimates (as opposed to the original model).

data("faithful")
erupdata <- list(
  K = 2,
  N = length(faithful$eruptions),
  y = faithful$eruptions
)

fiteruptions <- stan(file = 'mixturemodel.stan', data = erupdata, iter = 1000, chains = 1)

I am wondering, what I understand wrong about the model specification. I would like to understand the difference that the syntax provides (among others the difference between vector[K] and real[K]) and maybe get some deeper insights into Stan.

Drey
  • 3,314
  • 2
  • 21
  • 26

2 Answers2

2

The second program defines a different density. normal_lpdf returns a single scalar value that is the sum of log pdfs over the containers of data/parameters.

There's a chapter on matrices/vectors vs. arrays in the manual.

You want to pull the definition of ppt higher up for efficiency.

Bob Carpenter
  • 3,613
  • 1
  • 20
  • 13
0

I am not a Stan user and I do not explain the difference between vector[K] and real[K]. But according to some resources (here and here), the prior distributions of the mixture locations should be defined as ordered[K] instead of vector[K] to avoid the problem of non-identifiability / exchangeable priors.

For deeper understanding the above problem, the Identifying Bayesian Mixture Models blog is highly recommended. A working model copied from this blog:

data {
 int<lower = 0> N;
 vector[N] y;
}

parameters {
  ordered[2] mu;
  real<lower=0> sigma[2];
  real<lower=0, upper=1> theta;
}

model {
 sigma ~ normal(0, 2);
 mu ~ normal(0, 2);
 theta ~ beta(5, 5);
 for (n in 1:N)
   target += log_mix(theta,
                     normal_lpdf(y[n] | mu[1], sigma[1]),
                     normal_lpdf(y[n] | mu[2], sigma[2]));
}
Minh Vu
  • 483
  • 5
  • 8