0

I've closely followed this book (http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/MorePyMC.ipynb) but have found myself running into problems when trying to use Pymc for my own problem.

I've got a bunch of order values from customers who have placed an order and they look reasonably like a Gamma distribution. I'm running an AB test and want to see how the distribution of order values changes - enter Pymc. I was following the example in the book but found it didn't really work for me - first attempt was this:

import pymc as pm
import numpy as np
from matplotlib import pyplot as plt
from pylab import savefig

## Replace these with the actual order values in the test set
## Have made slightly different to be able to see differing distributions
observations_A = pm.rgamma(3.5, 0.013, size=1000)
observations_B = pm.rgamma(3.45, 0.016, size=2000)

## Identical prior assumptions
prior_a = pm.Gamma('prior_a', 3.5, 0.015)
prior_b = pm.Gamma('prior_b', 3.5, 0.015)

## The difference in the test groups is the most important bit
@pm.deterministic
def delta(p_A = prior_a, p_B = prior_b):
    return p_A - p_B

## Add observations
observation_a = pm.Gamma('observation_a', prior_a, value=observations_A, observed=True)
observation_b = pm.Gamma('observation_b', prior_b, value=observations_A, observed=True)

mcmc = pm.MCMC([prior_a, prior_b, delta, observation_a, observation_b])
mcmc.sample(20000,1000)

Looking at the mean of the trace for prior_a and prior_b I see values of around 3.97/3.98 and when I look at the stats of these priors I see a similar story. However, upon defining the priors, calling the rand() method on the prior gives me the kind of values I would expect (between 100 and 400). Basically, one of the updating stages (I'm least certain about the observation stages) is doing something I don't expect.

Having struggled with this for a bit I found this page (http://matpalm.com/blog/2012/12/27/dead_simple_pymc/) and decided a different approach may be in order:

import pymc as pm
import numpy as np
from matplotlib import pyplot as plt
from pylab import savefig

## Replace these with the actual order values in the test set
observations_A = pm.rgamma(3.5, 0.013, size=1000)
observations_B = pm.rgamma(3.45, 0.016, size=2000)

## Initial assumptions
A_Rate = pm.Uniform('A_Rate', 2, 4)
B_Rate = pm.Uniform('B_Rate', 2, 4)
A_Shape = pm.Uniform('A_Shape', 0.005, 0.05)
B_Shape = pm.Uniform('B_Shape', 0.005, 0.05)

p_A = pm.Gamma('p_A', A_Rate, A_Shape, value=observations_A, observed=True)
p_B = pm.Gamma('p_B', A_Rate, B_Shape, value=observations_B, observed=True)

## Sample
mcmc = pm.MCMC([p_A, p_B, A_Rate, B_Rate, A_Shape, B_Shape])
mcmc.sample(20000, 1000)

## Plot the A_Rate, B_Rate, A_Shape, B_Shape
## Using those, determine the Gamma distribution
## Plot both - and draw 1000000... samples from each.
## Perform statistical tests on these.

So instead of going straight for the Gamma distribution, we're looking to find the parameters (I think). This seems to work a treat in that it gives me values in the traces of the right order of magnitude. However, now I can plot a histogram of samples for alpha for both test groups and for beta but that's not really what I'm after. I want to be able to plot each of the test group's 'gamma-like' distributions, calculated from a prior and the values I supply. I also want to be able to plot a 'delta' as the AB testing example shows. I feel a deterministic variable on the second example is going to be my best bet but I don't really know the best way to go about constructing this.

Long story short - I've got data drawn from a Gamma distribution that I'd like to AB test. I've got a gamma prior view of the data, though could be persuaded that I've got a normal prior view if that's easier. I'd like to update identical priors with the data I've collected, in a sensible way, and plot the distributions and the difference between them.

Cheers,

Matt

Kali_89
  • 617
  • 2
  • 7
  • 21
  • So you want to generate some samples that follow your data distribution? – Srivatsan Jun 21 '14 at 19:54
  • I have a prior assumption about the distribution of my data, I've got samples of data taken from each distribution and I'd like to update my prior assumptions based on the data points I've collected. I've generated some 'example data points' in my code (observations_A and observations_B) for the purposes of testing but they'll be actual data points - I'd like to use them to modify my prior assumptions (Gamma distribution) to compare each test group. – Kali_89 Jun 21 '14 at 22:57

0 Answers0