I've closely followed this book (http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/MorePyMC.ipynb) but have found myself running into problems when trying to use Pymc for my own problem.
I've got a bunch of order values from customers who have placed an order and they look reasonably like a Gamma distribution. I'm running an AB test and want to see how the distribution of order values changes - enter Pymc. I was following the example in the book but found it didn't really work for me - first attempt was this:
import pymc as pm
import numpy as np
from matplotlib import pyplot as plt
from pylab import savefig
## Replace these with the actual order values in the test set
## Have made slightly different to be able to see differing distributions
observations_A = pm.rgamma(3.5, 0.013, size=1000)
observations_B = pm.rgamma(3.45, 0.016, size=2000)
## Identical prior assumptions
prior_a = pm.Gamma('prior_a', 3.5, 0.015)
prior_b = pm.Gamma('prior_b', 3.5, 0.015)
## The difference in the test groups is the most important bit
@pm.deterministic
def delta(p_A = prior_a, p_B = prior_b):
return p_A - p_B
## Add observations
observation_a = pm.Gamma('observation_a', prior_a, value=observations_A, observed=True)
observation_b = pm.Gamma('observation_b', prior_b, value=observations_A, observed=True)
mcmc = pm.MCMC([prior_a, prior_b, delta, observation_a, observation_b])
mcmc.sample(20000,1000)
Looking at the mean of the trace for prior_a and prior_b I see values of around 3.97/3.98 and when I look at the stats of these priors I see a similar story. However, upon defining the priors, calling the rand()
method on the prior gives me the kind of values I would expect (between 100 and 400). Basically, one of the updating stages (I'm least certain about the observation stages) is doing something I don't expect.
Having struggled with this for a bit I found this page (http://matpalm.com/blog/2012/12/27/dead_simple_pymc/) and decided a different approach may be in order:
import pymc as pm
import numpy as np
from matplotlib import pyplot as plt
from pylab import savefig
## Replace these with the actual order values in the test set
observations_A = pm.rgamma(3.5, 0.013, size=1000)
observations_B = pm.rgamma(3.45, 0.016, size=2000)
## Initial assumptions
A_Rate = pm.Uniform('A_Rate', 2, 4)
B_Rate = pm.Uniform('B_Rate', 2, 4)
A_Shape = pm.Uniform('A_Shape', 0.005, 0.05)
B_Shape = pm.Uniform('B_Shape', 0.005, 0.05)
p_A = pm.Gamma('p_A', A_Rate, A_Shape, value=observations_A, observed=True)
p_B = pm.Gamma('p_B', A_Rate, B_Shape, value=observations_B, observed=True)
## Sample
mcmc = pm.MCMC([p_A, p_B, A_Rate, B_Rate, A_Shape, B_Shape])
mcmc.sample(20000, 1000)
## Plot the A_Rate, B_Rate, A_Shape, B_Shape
## Using those, determine the Gamma distribution
## Plot both - and draw 1000000... samples from each.
## Perform statistical tests on these.
So instead of going straight for the Gamma distribution, we're looking to find the parameters (I think). This seems to work a treat in that it gives me values in the traces of the right order of magnitude. However, now I can plot a histogram of samples for alpha for both test groups and for beta but that's not really what I'm after. I want to be able to plot each of the test group's 'gamma-like' distributions, calculated from a prior and the values I supply. I also want to be able to plot a 'delta' as the AB testing example shows. I feel a deterministic variable on the second example is going to be my best bet but I don't really know the best way to go about constructing this.
Long story short - I've got data drawn from a Gamma distribution that I'd like to AB test. I've got a gamma prior view of the data, though could be persuaded that I've got a normal prior view if that's easier. I'd like to update identical priors with the data I've collected, in a sensible way, and plot the distributions and the difference between them.
Cheers,
Matt