4

I'm trying to combine the values of 2 vectors and learn the latent probability distribution for one of the vectors. However when I try and do this combination with DensityDist wit the 'obs' variable I'm not recovering the injected distribution. I'm still trying to understand PyMC, so it's probably a misunderstanding on my part, but I'm fairly stuck at the moment. The code below shows the data and model I'm building. I think the problem is the lambda function I'm using which is trying to produce a log probability of the result.

%matplotlib inline
import numpy as np
from pymc3 import *
import pymc3 as pm
import pandas as pd
print 'pymc3 version: ' + pm.__version__


#INJECTED DISTRIBUTION OF PROBABILITIES
true_probs = [0.3, 0.6, 0.1]
k = len(true_probs)

y = np.array([[  8.33875891e-01,   1.49760167e-02,   1.51148092e-01],
 [  1.36399585e-01,   7.30337980e-01,   1.33262435e-01],
 [  4.97603830e-05,   9.94778208e-01,   5.17203171e-03],
 [  5.00358612e-01,   4.84967429e-01,   1.46739594e-02],
 [  9.55022216e-01,   8.09821124e-04,   4.41679631e-02],
 [  8.43878730e-01,   3.70989423e-02,   1.19022328e-01],
 [  2.51328518e-02,   9.09905536e-01,   6.49616117e-02],
 [  8.98816908e-01,   9.33910096e-02,   7.79208227e-03],
 [  1.41376589e-01,   1.88822492e-02,   8.39741162e-01],
 [  4.97743086e-06,   9.42248660e-01,   5.77463628e-02],
 [  1.56530473e-02,   9.80386023e-01,   3.96092994e-03],
 [  5.08725850e-02,   8.43604796e-01,   1.05522619e-01],
 [  2.30447406e-02,   9.76817732e-01,   1.37527223e-04],
 [  2.61205153e-01,   7.19243432e-01,   1.95514147e-02],
 [  1.93474892e-03,   8.43538791e-01,   1.54526460e-01],
 [  9.95086044e-01,   5.04045287e-08,   4.91390547e-03],
 [  7.20595493e-04,   9.97778427e-01,   1.50097771e-03],
 [  8.42073425e-01,   2.01386086e-02,   1.37787966e-01],
 [  9.97577668e-01,   2.06724434e-03,   3.55087319e-04],
 [  4.95181598e-01,   4.42076834e-01,   6.27415684e-02]])
sample_size = y.shape[0]

#ROUGH SUM OF THE ORIGINAL DATA
print y.sum(axis=0)   

with pm.Model() as multinom_test:

    #DISTRIBUTION OF PARAMS I WANT TO LEARN
    params = pm.Dirichlet('a', a=np.ones(k))

    #DATA WITH A WEIGHTING APPLIED TO EACH DATAPOINT
    data_weighting = abs(np.ones(sample_size)+np.random.normal(0, 1, sample_size))

    #PROBLEM IS HERE: TRYING TO COMBINE THESE PROBABILITIES BY MULTIPLYING THE VECTORS AND SUMMING THE THE JOINT PROB
    obs = DensityDist('observations',lambda value: -1*np.log(np.outer(params, data_weighting).sum()), observed=y)    

    trace = pm.sample(5000, pm.Metropolis())

print trace[probs].mean(0)

The results consistently places all the probability in the last term and not the proper params

pymc3 version: 3.0.rc2

y.sum(axis=0):
[  8.01826573  10.05304779   1.92868648]

trace[probs].mean(0)
[ 0.03199909  0.04873978  0.91926114]
Chris
  • 676
  • 5
  • 20

0 Answers0