PyStan fails with Segfault when running basic probit model

Question

I am getting familiar with PyStan. I have run several models including a Bayesian linear regression model without issue. However, when I try to run the following code I get a seg fault error:

Segmentation fault (core dumped)

Interestingly, this only occurs after the burn-in iterations have completed. My model code is here. I am currently making up some data and trying to infer the parameters beta_tru_1 and alpha. The model code comes from the Stan User's Guide.

import pystan
import numpy as np
from scipy.stats import norm

# the params
beta_tru_1 = 3.7
alpha = 2.3

# make some data
n = 1000
np.random.seed(1)
x1 = norm(0, 1).rvs(n)
z = alpha + x1 * beta_tru_1
y = [1 if i > 0.7 else 0 for i in norm.cdf(z)]

# train test split
y_train, y_test = y[:750], y[750:]
x_train, x_test = x1[:750], x1[750:]

# stan code
probit_code = """
data {
    int<lower=0> n; // number of data vectors
    real x[n]; // data matrix
    int<lower=0,upper=1> y[n]; // response vector
}
parameters {
    real beta; // regression coefs
    real alpha;
}
model {
    for (i in 1:n)
      y[i] ~ bernoulli(Phi(alpha + beta * x[i]));
}
"""

# compile the model
probit_model = pystan.StanModel(model_code=probit_code)

# the data
probit_dat = {
    "n": len(y_train),
    "y": y_train,
    "x": x_train
}

# fit the model (small number of iterations for debug)
# this is where the error is
probit_fit = probit_model.sampling(data=probit_dat, iter=500, warmup=500, chains=4, init="0")

I am using PyStan v. 2.19.1.1 and python 3.7.6 on Linux Pop OS 20.10. I have run this code on multiple machines including an Ubuntu container with no luck. Any help is appreciated.

Try running your program with `strace` to get more information about the crash; ideally you would be able to see what the program was trying to do when it crashed. It might also be possible to execute Python from `gdb` and get more information that way (just guessing here, I haven't tried it). A Stan-specific forum might be able to give you more focused advice. — Robert Dodier, Feb 16 '21 at 19:18

score 0 · Accepted Answer · answered Feb 22 '21 at 19:26

I was able to determine the cause of this error. The arguments I was using for the probit_model.sampling parameters were causing a problem. Specifically, the iter and warmups parameter cannot be the same. The iter parameter is the total number of iterations including the "burn-in" iterations specified through the warmup parameter. Thus the inputs I was specifying caused probit_model.sampling to perform the 500 burn-in iterations and then sampling was ceased rather than performing another 500 iterations for 1000 total HMC iterations.

The correct parameter setting for this behavior would be:

probit_model.sampling(data=probit_dat, iter=1000, warmup=500)

I have tested this solution, and it performs as intended. (Though this number of samples is not sufficient for valid inference in this problem.)

PyStan fails with Segfault when running basic probit model

1 Answers1