Constructing a non-linear exponential model -- use a vector or real?

Question

I'm new to Stan and probabilistic programming. I'm trying to construct a non-linear growth model. I've been able to construct the model in NLS

The NLS formula I used is: Trump_Pct ~ alpha - beta * lambda^Population

My NLS summary is:

Parameters:
     Estimate Std. Error t value Pr(>|t|)    
alpha  5.627e+01  2.053e+00   27.41   <2e-16 ***
beta  3.018e+01  1.974e+00   15.29   <2e-16 ***
lambda 9.981e-01  2.486e-04 4014.47   <2e-16 ***

In other words, a basic exponential decay curve. I'm trying to replicate with Stan.

My data is as follows:

I have N observations in the dataset: The predictor is the population of a county ("Population") and the predicted Y is the percent of vote to Trump "Trump_Pct".

I have tried two ways of constructing this model.

In one, I pass in each component to the data to the model as a vector.
In the other, I leave each data component as a list and attempt to use each data point.

I'm not able in either case to get the model to run successfully.

Here are my models:

Case 1:

This is an adaptation of this model.

Here I've created vectorized versions of the columns Trump_Pct and Population.

data {
    int N;
    vector[N] PopulationV;
    vector[N] Trump_PctV;
}
parameters {
    vector [1] alpha;
    vector [1] beta;
    vector [1] lambda;
    real<lower=0> sigma;
}
model {
    vector[N] ypred;
    ypred = alpha[1] - beta[1] * (lambda[1]^PopulationV);
    Trump_PctV ~ ypred + sigma;
}

This model fails at the line with the exponent for the following reason:

`SYNTAX ERROR, MESSAGE(S) FROM PARSER:

arguments to ^ must be primitive (real or int); cannot exponentiate real by vector in block=local`

I've tried using pow() but can't find a way forward. Any tips?

Case 2:

data {
  int<lower=0> N;
  real <lower=0> Population[N];
  real <lower=0> Trump_Pct[N];
}
parameters {
  real alpha;
  real beta;
  real<lower=3,upper= 4> lambda;
  real<lower=0> tau;
}
transformed parameters {
  real sigma;
  sigma = 1 / sqrt(tau);
}
model {
  real m[N];
  for (i in 1:N)
    m[i] = alpha - beta * pow(lambda, Population[i]);

  Trump_Pct ~ normal(m, sigma);

  alpha ~ normal(10, 20);
  beta ~ normal(5, 10);
  lambda ~ uniform(3, 4);
  tau ~ gamma(.0001, .0001);
}

In case 2, I am not able to keep the parameter estimates within bounds:

"Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:" [2] "Exception thrown at line 21: normal_log: Location parameter[2873] is -inf, but must be finite!"

Can anyone offer an advice for a simple non-linear model for my formula?

Putting `lambda ~ uniform(3, 4);` in the model block is not necessary because it is implied by `real lambda;`. That said, uniform priors are not recommended in Stan. The same could be said for `tau ~ gamma(.0001, .0001);` which is almost uniform over a wide range of the positive real numbers but has very sharp curvature near zero. There are recommendations for priors in Stan at https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations . Finally, the `brm` function in the brms package allows you to specify a model in a similar way to `nls`. — Ben Goodrich, Jan 10 '17 at 03:38

score 2 · Accepted Answer · answered Jan 10 '17 at 03:25

Your case 2 is the correct syntax. As you discovered, neither ^ nor pow input vectors, so you have to loop over them.

The informational message you see is due to numerical overflow, and should not cause the sampler to stop. There is more detail about that message here.

It is possible that the sampler cannot get going, in which case you can pass the init_r value to stan or sampling and set init_r to a value less than its default of 2. This affects the width of the uniform interval from which initial values are drawn in the unconstrained space.

If there are many overflow messages, it is quite possible that you have other problems as well, such as divergent transitions that are also covered at the above link. The ultimate solution probably involves rescaling the data, reparameterizing the model, and / or tightening the priors.

Can I ask why case 2 is correct? You're correct I cannot get the sampler going.. will try your solution. — Union find, Jan 10 '17 at 04:00
Case 2 is valid syntax, whereas case 1 is not. All unary functions can be called on vector input, but not functions with two or more arguments. — Ben Goodrich, Jan 11 '17 at 18:19

Constructing a non-linear exponential model -- use a vector or real?

1 Answers1