0

I am trying to create a conditional logit model in pymc3. I have code that generates parameters that identify the correct categories in my toy problem. However, the parameters that it produces are far from optimal. Below I show a version where I have generated the MAP. With diffuse priors this ought to be somewhat close to the MLE, but it is easy to see that a beta vector of (0, larger negative, 0) will give much better log likelihoods.

What is particularly puzzling to me is that as I increase the number of cases the estimate of beta[1] becomes even closer to 0 and the model becomes less certain of the appropriate categories. I would have thought the opposite should be true.

Any clarification would be most appreciated.

NumberOfCases = 5
NumberOfOptions = 3
NumberOfFeatures = 3

# generate data

inputs = ones((NumberOfCases, NumberOfFeatures, NumberOfOptions), float)
choices = np.zeros(NumberOfCases, dtype="int")

for case in range(NumberOfCases):
    c = randint(NumberOfOptions)
    choices[case] = c
    inputs[case, 1, c] = 0.0   # set up dimension 1 to be a better predictor than the other dimensions

# define model and sample

with pm.Model() as model_sf:
    beta = pm.Normal ('beta', mu = 0, sd = 100, shape = (1,NumberOfFeatures))
    mu = tt.tensordot(beta, inputs, [[1], [1]])[0,:,:]
    theta = pm.Deterministic("theta", tt.nnet.softmax(mu))
    y = pm.Categorical ('y', p = theta, observed = choices)
    d = pm.find_MAP()


print(choices)
print(d["beta"])
print(d["theta"])

Output:

[1 2 0 1 0]
[[-0.     -0.1178 -0.    ]]
[[0.32 0.36 0.32]
 [0.32 0.32 0.36]
 [0.36 0.32 0.32]
 [0.32 0.36 0.32]
 [0.36 0.32 0.32]]
  • Can you rerun this and verify those are the results? I get something like `d['beta'] = [0, -10, 0]` for the same `choices`. – colcarroll Sep 23 '19 at 17:29
  • It seems to be something to do with pymc3 version 3.7. When I do it with 3.6 it works. It gives a value near -10 and it becomes more certain with more data. @colcarrol can you confirm you were using a version of pymc3 other than 3.7? – Simon Dennis Sep 23 '19 at 23:25
  • That's interesting, I'm running this on master, which is 3.7. You can troubleshoot using something like `print(model_sf.logp(beta=d['beta']), model_sf.logp(beta=np.array([[0, -20, 0]])))`, and compare the log likelihood of possible betas... – colcarroll Sep 24 '19 at 00:13
  • This might be the source of the problem https://github.com/aloctavodia/BAP/issues/48. It seems it has been fixed on master. – Simon Dennis Sep 24 '19 at 00:47
  • Wow, good catch! I think you're right. – colcarroll Sep 25 '19 at 13:37
  • Thanks for your help. – Simon Dennis Sep 26 '19 at 22:01

0 Answers0