I am using the Bayesian logistic regression (probit) from the rstanarm package to train a model on default events. As inputs the model accepts some financial ratios and some qualitative data. Is there a way where I can actually regularise the coefficients, for the qualitative data only, to be always positive?
For example, when I use a single prior for everything I get these results (I calibrate the model using MCMC, with set.seed(12345)
):
prior <- rstanarm::normal(location = 0, scale = NULL, autoscale = TRUE)
model.formula <-
formula(paste0('default_events ~ fin_ratio_1 + ',
'fin_ratio_2 + fin_ratio_3 +',
'fin_ratio_4 + fin_ratio_5 +',
'fin_ratio_6 + fin_ratio_7 +',
'fin_ratio_8 + Qual_1 + Qual_2 +',
'Qual_3 + Qual_4'))
bayesian.model <- rstanarm::stan_glm(model.formula,
family = binomial(link = "probit"),
data = as.data.frame(ds), prior = prior,
prior_intercept = NULL,
init_r = .1, iter=600, warmup=200)
The coefficients are the following:
summary(bayesian.model)
Estimates:
mean sd 2.5% 25% 50% 75% 97.5%
(Intercept) -2.0 0.4 -2.7 -2.3 -2.0 -1.7 -1.3
fin_ratio_1 -0.7 0.1 -0.9 -0.8 -0.7 -0.6 -0.4
fin_ratio_2 -0.3 0.1 -0.5 -0.4 -0.3 -0.2 -0.1
fin_ratio_3 0.4 0.1 0.2 0.4 0.4 0.5 0.6
fin_ratio_4 0.3 0.1 0.1 0.2 0.3 0.3 0.4
fin_ratio_5 0.2 0.1 0.1 0.2 0.2 0.3 0.4
fin_ratio_6 -0.2 0.1 -0.4 -0.2 -0.2 -0.1 0.0
fin_ratio_7 -0.3 0.1 -0.5 -0.3 -0.3 -0.2 -0.1
fin_ratio_8 -0.2 0.1 -0.5 -0.3 -0.2 -0.1 0.0
Qual_1 -0.2 0.1 -0.3 -0.2 -0.2 -0.1 -0.1
Qual_2 0.0 0.1 -0.1 -0.1 0.0 0.0 0.1
Qual_3 0.2 0.0 0.1 0.1 0.2 0.2 0.3
Qual_4 0.0 0.2 -0.3 -0.1 0.0 0.1 0.3
The question is, can I use two different distributions? Like for fin_ratio_x
variables to use normal and for Qual_x
variables to use exponential or dirichlet?