I am trying to recreate the models in John Kruschke's 'Doing Bayesian Data Analysis' and am currently trying to model ordinal data (chapter 23 in the book. This is the JAGS model that I'm trying to recreate:
total = length(y)
#Threshold 1 and nYlevels-1 are fixed; other thresholds are estimated.
#This allows all parameters to be interpretable on the response scale.
nYlevels = max(y)
thresh = rep(NA,nYlevels-1)
thresh[1] = 1 + 0.5
thresh[nYlevels-1] = nYlevels-1 + 0.5
modelString = "
model {
for ( i in 1:Ntotal ) {
y[i] ~ dcat( pr[i,1:nYlevels] )
pr[i,1] <- pnorm( thresh[1] , mu[x[i]] , 1/sigma[x[i]]^2 )
for ( k in 2:(nYlevels-1) ) {
pr[i,k] <- max( 0 , pnorm( thresh[ k ] , mu[x[i]] , 1/sigma[x[i]]^2 )
- pnorm( thresh[k-1] , mu[x[i]] , 1/sigma[x[i]]^2 ) )
}
pr[i,nYlevels] <- 1 - pnorm( thresh[nYlevels-1] , mu[x[i]] , 1/sigma[x[i]]^2 )
}
for ( j in 1:2 ) { # 2 groups
mu[j] ~ dnorm( (1+nYlevels)/2 , 1/(nYlevels)^2 )
sigma[j] ~ dunif( nYlevels/1000 , nYlevels*10 )
}
for ( k in 2:(nYlevels-2) ) { # 1 and nYlevels-1 are fixed, not stochastic
thresh[k] ~ dnorm( k+0.5 , 1/2^2 )
}
}
This model involves using the cumulative normal function (pnorm in R). I tried to do it as follows, using Scipy's norm.cdf:
nYlevels = df.Y.cat.categories.size
thresh = np.arange(1.5, nYlevels)
with pm.Model() as ordinal_model:
#priors for mu and sigma
mu = pm.Normal('mu', (1+ nYlevels)/2.0, 1.0/(nYlevels**2))
sig = pm.Uniform('sig', nYlevels/1000.0, nYlevels*10.0)
#priors for the other parameters (one for each ordinal level, except the fixed lowest and highest levels)
theta2 = pm.Normal('theta2',mu=thresh[1], sigma = (1/2)**2)
theta3 = pm.Normal('theta3',mu=thresh[2], sigma = (1/2)**2)
theta4 = pm.Normal('theta4',mu=thresh[3], sigma = (1/2)**2)
theta5 = pm.Normal('theta5',mu=thresh[4], sigma = (1/2)**2)
#probabilities for the ordinal levels
prob_theta1 = norm(loc=mu, scale = (1 / (sig ** 2))).cdf(thresh[0])
prob_theta2 = np.max([0, (norm(mu=mu, sigma = (1 / (sig ** 2))).cdf(theta2)) - (NormalDist(mu=mu, sigma=(1 / (sig ** 2))).cdf(thresh[0]))])
prob_theta3 = np.max([0, (norm(mu=mu, sigma = (1 / (sig ** 2))).cdf(theta3)) - (NormalDist(mu=mu, sigma=(1 / (sig ** 2))).cdf(theta2))])
prob_theta4 = np.max([0, (norm(mu=mu, sigma = (1 / (sig ** 2))).cdf(theta4)) - (NormalDist(mu=mu, sigma=(1 / (sig ** 2))).cdf(theta3))])
prob_theta5 = np.max([0, (norm(mu=mu, sigma = (1 / (sig ** 2))).cdf(theta5)) - (NormalDist(mu=mu, sigma=(1 / (sig ** 2))).cdf(theta4))])
prob_theta6 = 1 - (norm(mu=mu, sigma=(1 / (sig ** 2))).cdf(theta5))
pr = np.array([prob_theta1,prob_theta2,prob_theta3,prob_theta4,prob_theta5,prob_theta6])
y = pm.Categorical('y', p = pr, observed=df.Y.cat.codes.values)
However, when I run this, I get an error: 'TypeError: Unsupported dtype for TensorType: object'. It occurs at the line I use norm.cdf. Perhaps stats.scipy.norm cannot work with pymc3 objects?
So I was wondering how to use the cumulative normal function in a PyMC3 model, or how to make norm.cdf work here.