Validating Bayesian logistic regression model with iris dataset

Question

We have a bayesian logistic regression model in python using a pyro and sklearn environment.
The model is trained and tested with a dataset that contains categorical and continuous variables. The code starts by preprocessing the dataset (using the LabelEncoder for categorical variables).

Here is the code then defines the function:

import torch
import torch
import pyro
import pyro.distributions as dist
from pyro.infer import MCMC, NUTS

""" Model definition """
def bayesian_logistic_regression(X_categorical, X_continuous):
    num_categories = X_categorical.shape[1]
    num_predictors = num_categories + X_continuous.shape[1]
    
    alpha = pyro.sample("alpha", dist.Normal(alpha_prior_mean, alpha_prior_scale))
    beta_categorical = pyro.sample("beta_categorical",
                                   dist.Normal(beta_categorical_prior_means,
                                               beta_categorical_prior_scales))
    beta_continuous = pyro.sample("beta_continuous",
                          dist.Normal(beta_continuous_prior_means.unsqueeze(1),
                                      beta_continuous_prior_scales.unsqueeze(1)))
    
    logits = alpha 
            + torch.matmul(X_categorical, beta_categorical) 
            + torch.matmul(X_continuous, beta_continuous)

    probs = torch.sigmoid(logits)
    beta_categorical = beta_categorical.to(torch.float)
    beta_continuous = beta_continuous.to(torch.float)
    y_obs = pyro.sample("y", dist.Bernoulli(probs=probs), obs=y) #dist.Multinomial(100, probs)

    # Inference
    nuts_kernel = NUTS(bayesian_logistic_regression)
    mcmc = MCMC(nuts_kernel, num_samples=50, warmup_steps=10, num_chains=1)
    mcmc.run(X_categorical, X_continuous)

    # Get posterior samples
    posterior_samples = mcmc.get_samples()
    alpha_samples = posterior_samples['alpha']
    beta_categorical_samples = posterior_samples['beta_categorical']
    beta_continuous_samples = posterior_samples['beta_continuous']

    return alpha_samples, beta_categorical_samples, 
            beta_continuous_samples, posterior_samples

""" Priors """
alpha_prior_mean = torch.tensor(0.0)
alpha_prior_scale = torch.tensor(1.0)
beta_categorical_prior_means = torch.tensor([0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
beta_categorical_prior_scales = torch.tensor([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0])

# Fix the data types and dimensions of beta_continuous_prior_means and beta_continuous_prior_scales
beta_continuous_prior_means = torch.tensor([0.0, 0.0, 0.0, 0.0], 
dtype=torch.float32)
beta_continuous_prior_scales = torch.tensor([1.0, 1.0, 1.0, 1.0], 
dtype=torch.float32)

""" Run model """
alpha_samples, beta_categorical_samples, 
beta_continuous_samples, posterior_samples  = bayesian_logistic_regression(y_train, X_categorical, X_continuous)

The objective of this model is to predict the response variable, but also to serve as a tool that can be easily modified for additional or changes in the input parameters.

The results of this model returns predictions all = 0 (for the binary response variable) which is incorrect based on the samples. My current assumption is that there is an issue with the way I have set up the model definition and using the NUTS MCMC sampling methods.

My question is: have I incorrectly set up this model or is there a better methodology to coding a Bayesian logistic regression?

I have attempted to use this methodology on the Iris dataset. I have updated the code to work with the input parameters used in previous papers.
However, my code still predicts 0 for the response variable on this dataset as well. We once again know this to be false since we have results for the input parameters used in the model.

Validating Bayesian logistic regression model with iris dataset

My question is: have I incorrectly set up this model or is there a better methodology to coding a Bayesian logistic regression?

0 Answers0