I am trying to create a conditional logit model in pymc3. I have code that generates parameters that identify the correct categories in my toy problem. However, the parameters that it produces are far from optimal. Below I show a version where I have generated the MAP. With diffuse priors this ought to be somewhat close to the MLE, but it is easy to see that a beta vector of (0, larger negative, 0) will give much better log likelihoods.
What is particularly puzzling to me is that as I increase the number of cases the estimate of beta[1] becomes even closer to 0 and the model becomes less certain of the appropriate categories. I would have thought the opposite should be true.
Any clarification would be most appreciated.
NumberOfCases = 5
NumberOfOptions = 3
NumberOfFeatures = 3
# generate data
inputs = ones((NumberOfCases, NumberOfFeatures, NumberOfOptions), float)
choices = np.zeros(NumberOfCases, dtype="int")
for case in range(NumberOfCases):
c = randint(NumberOfOptions)
choices[case] = c
inputs[case, 1, c] = 0.0 # set up dimension 1 to be a better predictor than the other dimensions
# define model and sample
with pm.Model() as model_sf:
beta = pm.Normal ('beta', mu = 0, sd = 100, shape = (1,NumberOfFeatures))
mu = tt.tensordot(beta, inputs, [[1], [1]])[0,:,:]
theta = pm.Deterministic("theta", tt.nnet.softmax(mu))
y = pm.Categorical ('y', p = theta, observed = choices)
d = pm.find_MAP()
print(choices)
print(d["beta"])
print(d["theta"])
Output:
[1 2 0 1 0]
[[-0. -0.1178 -0. ]]
[[0.32 0.36 0.32]
[0.32 0.32 0.36]
[0.36 0.32 0.32]
[0.32 0.36 0.32]
[0.36 0.32 0.32]]