0

I am getting familiar with PyStan. I have run several models including a Bayesian linear regression model without issue. However, when I try to run the following code I get a seg fault error:

Segmentation fault (core dumped)

Interestingly, this only occurs after the burn-in iterations have completed. My model code is here. I am currently making up some data and trying to infer the parameters beta_tru_1 and alpha. The model code comes from the Stan User's Guide.

import pystan
import numpy as np
from scipy.stats import norm

# the params
beta_tru_1 = 3.7
alpha = 2.3

# make some data
n = 1000
np.random.seed(1)
x1 = norm(0, 1).rvs(n)
z = alpha + x1 * beta_tru_1
y = [1 if i > 0.7 else 0 for i in norm.cdf(z)]

# train test split
y_train, y_test = y[:750], y[750:]
x_train, x_test = x1[:750], x1[750:]

# stan code
probit_code = """
data {
    int<lower=0> n; // number of data vectors
    real x[n]; // data matrix
    int<lower=0,upper=1> y[n]; // response vector
}
parameters {
    real beta; // regression coefs
    real alpha;
}
model {
    for (i in 1:n)
      y[i] ~ bernoulli(Phi(alpha + beta * x[i]));
}
"""

# compile the model
probit_model = pystan.StanModel(model_code=probit_code)

# the data
probit_dat = {
    "n": len(y_train),
    "y": y_train,
    "x": x_train
}

# fit the model (small number of iterations for debug)
# this is where the error is
probit_fit = probit_model.sampling(data=probit_dat, iter=500, warmup=500, chains=4, init="0")

I am using PyStan v. 2.19.1.1 and python 3.7.6 on Linux Pop OS 20.10. I have run this code on multiple machines including an Ubuntu container with no luck. Any help is appreciated.

samvoit4
  • 295
  • 2
  • 8
  • 1
    Try running your program with `strace` to get more information about the crash; ideally you would be able to see what the program was trying to do when it crashed. It might also be possible to execute Python from `gdb` and get more information that way (just guessing here, I haven't tried it). A Stan-specific forum might be able to give you more focused advice. – Robert Dodier Feb 16 '21 at 19:18

1 Answers1

0

I was able to determine the cause of this error. The arguments I was using for the probit_model.sampling parameters were causing a problem. Specifically, the iter and warmups parameter cannot be the same. The iter parameter is the total number of iterations including the "burn-in" iterations specified through the warmup parameter. Thus the inputs I was specifying caused probit_model.sampling to perform the 500 burn-in iterations and then sampling was ceased rather than performing another 500 iterations for 1000 total HMC iterations.

The correct parameter setting for this behavior would be:

probit_model.sampling(data=probit_dat, iter=1000, warmup=500)

I have tested this solution, and it performs as intended. (Though this number of samples is not sufficient for valid inference in this problem.)

samvoit4
  • 295
  • 2
  • 8