Extract and plot pystan3 prediction

Question

I have been tasked to convert an R-stan script into python pystan3 for a research project. The collegue who wrote the R code left time ago and won't help much so I was left on my own devices.

Here is the stan-model code, my python implementation of the stan calls and some demo-data. I can run the model but I don't know how to plot the predicted and observed variables on a x-y chart and extract the predicted ones given a x value. I know I can convert the fit object into a dataframe but the meaning of the columns is obscure to me and I simply don't understand what are the dependent, independent/predicted variables in the data frame. I crash my head around forums but I could not find any answer that fitted my case (not with my level of experience at least). Thank you in advance.

import stan
import numpy as np

model_code_2s = '''
        data {
          int<lower=0> N; // number of measurements
        
          vector[N] h;
          vector[N] Q_meas;
          vector<lower=0>[N] Q_error;
          
          real mu_a1;
          real sigma_a1;
          real mu_b1;
          real sigma_b1;
          real mu_c1;
          real sigma_c1;
          real mu_k1;
          real sigma_k1;
          real mu_a2;
          real sigma_a2;
          real mu_c2;
          real sigma_c2;
          
          int<lower=0> N_ext;
          vector[N_ext] h_ext;
        }
        
        parameters {
          real<lower=0> a1;
          real<upper=min(h)> b1;
          real<lower=0> c1;
          
          real<lower=0> a2;
          real<lower=0> c2;
          
          real<lower=b1> k1;
          
          real<lower=0> sigma;
        }
        
        transformed parameters {
        
          real b2 = k1 - pow(a1/a2,1/c2)*pow(k1-b1,c1/c2);
          
          vector[N] log_mu;
          
          for (i in 1:N) {
            if (h[i] <= k1) {
              log_mu[i] = log(a1) + c1*log(h[i]-b1);
            } else {
              log_mu[i] = log(a2) + c2*log(h[i]-b2);
            }
          }
          
          vector[N] mu = exp(log_mu);
          vector[N] sigma_mu = mu*sigma;
          
          vector[N_ext] mu_ext;
          
          for (i in 1:N_ext) {
            if (h_ext[i] <= k1) {
              mu_ext[i] = a1*(h_ext[i] - b1)^c1;
            } else {
              mu_ext[i] = a2*(h_ext[i] - b2)^c2;
            }
          }
        
        }
        
        model {
        
          // priors
          a1 ~ normal(mu_a1, sigma_a1);
          b1 ~ normal(mu_b1, sigma_b1);
          c1 ~ normal(mu_c1, sigma_c1);
          
          k1 ~ normal(mu_k1, sigma_k1);
          
          a2 ~ normal(mu_a2, sigma_a2);
          c2 ~ normal(mu_c2, sigma_c2);
          
          sigma ~ uniform(0,10000);
          
          // likelihood
          Q_meas ~ normal(mu, ((sigma_mu)^2 + Q_error^2)^0.5);
        
        }
        
        generated quantities {
          
          vector[N] Q_res; // residuals Q_res = Q_meas - Q_mu (accounting for uncertainty in Q_meas)
        
          for (i in 1:N) {
            Q_res[i] = normal_rng(Q_meas[i],Q_error[i]) - normal_rng(mu[i],sigma_mu[i]);
          }
          
          vector[N_ext] Q_ext;
          
          for (i in 1:N_ext) {
            if (h_ext[i] <= k1) {
              Q_ext[i] = normal_rng(a1*(h_ext[i]-b1)^c1,mu_ext[i]*sigma);
            } else {
              Q_ext[i] = normal_rng(a2*(h_ext[i]-b2)^c2,mu_ext[i]*sigma);
            }
          }
          
          
        }
'''
h = [3600.0, 2000.0, 1400.0, 1300.0]
Q_meas = [1110.2, 194.4, 90.6, 69.1]
Q_error = [61.3, 8.7, 4.8, 3.7]

N = len(h)
h_ext = np.arange(1300.0, 3600.0, 10) # min(h) to max(h) with 0.1 step
N_ext = len(h_ext)

data_to_fit = {
        'N': N,
        'h': h,
        'Q_meas': Q_meas,
        'Q_error': Q_error,
        'mu_a1': 20, 
        'sigma_a1': 10,
        'mu_b1': 0,
        'sigma_b1': 5,
        'mu_c1': 2,
        'sigma_c1': 0.5,
        'mu_k1': 4,
        'sigma_k1': 2,
        'mu_a2': 20,
        'sigma_a2': 10,
        'mu_c2': 2,
        'sigma_c2': 0.5,      
        'N_ext': N_ext,
        'h_ext': h_ext
        }

posterior = stan.build(model_code_2s, data = data_to_fit, random_seed = 42)
fit = posterior.sample(num_chains = 4, num_samples = 5000, num_warmup = 1000)

score 0 · Answer 1 · answered Oct 27 '22 at 17:05

First, you can omit the text between ''' if it is not something that you want to see. It is a multi-line comment.

This is a coding example where the python variable names, are not really helpful so it's not easy to understand what the values represent.

To explain the dependent, independent and predicted variables I will provide a simple example.

Supposing that the temperature of a place depends on the distance from sea level, the latitude and longitude, month, among other variables.

Given the value of these variables, the system will be able to learn and predict the temperature.

So temperature is the target-dependent variable, while the rest are the independent variables.

For serious explanations regarding these issues you can find further information in Cross Validated site of Stack Exchange.

Now regarding the visualization you might want to check ArviZ or SeaBorn libraries.

Thank you for the help, I think I wasn't so clear in my post. I know what dependent/independent variables are in general, I just don't know how to extract them from the stan fit. The text between ''' is the stan code which I think is quite relevant considering the context of the question. I know both of the libraries you mention, my mistake I did not mention this in the question but again, the problem is not how to plot but what to plot. — DG_, Oct 28 '22 at 06:29

Extract and plot pystan3 prediction

1 Answers1