0

I'm trying to use pymc to solve a simple model:

  • I have N=1000 fluxes that I know are drawn from a Pareto distribution: flux ~ Pareto(alpha, 1)

  • I'm trying to work out the alpha parameter of the Pareto: alpha ~ Uniform(1, 3)

  • My flux measurements are contaminated by Gaussian noise: flux_meas ~ Gaussian(flux, tau)

Currently, I'm just trying to simulate what happens if I change the amount of noise.

The problem is that, when I run the model with a very small (i.e. negligible) amount of noise, the mean value of alpha changes radically for every run, and doesn't seem to be related to the true value of alpha at all. But, if I disregard the Gaussian noise entirely, and say I just observe the Pareto distribution directly, it works as expected.

What am I doing wrong?

A working code snippet is here: (A more complex, older one is here)

The key bits are as follows:

import pymc
N = 1000
true_alpha = 2
noise = 0.001 # This noise is much smaller than the signal

# Simulated fluxes
s_arr = pymc.rpareto(true_alpha, 1, size=N)

# the unknown alpha
alpha = pymc.Uniform('alpha', 1, 3)

# fluxes are drawn from a Pareto distribution
flux = pymc.Pareto('flux', alpha, 1, size=N)

# My observed fluxes are contaminated by Gaussian noise
flux_meas = pymc.Normal('flux_meas', mu=flux, tau=noise**-2, observed=True,
                         value=pymc.rnormal(s_arr, tau=noise**-2, size=N))

model = pymc.MCMC([alpha, flux, flux_meas])

# If I run this model several times, the mean of alpha will be somewhere between
# 1 and 3. The variance of alpha is pretty small
model.sample(5000, 1000, 5)
rcs
  • 67,191
  • 22
  • 172
  • 153
freddofrog
  • 101
  • 6

1 Answers1

0

The model above runs fine for me:

In [5]: alpha.summary()

alpha:

    Mean             SD               MC Error        95% HPD interval
    ------------------------------------------------------------------
    2.473            0.079            0.005            [ 2.317  2.612]


    Posterior quantiles:

    2.5             25              50              75             97.5
     |---------------|===============|===============|---------------|
    2.317            2.422           2.474          2.522         2.613

How do you induce the highly variable output that you are reporting? Keep in mind that the Uniform variable specified by alpha is not a model for the variation of alpha, it is the prior, and so specifies our uncertainty in alpha.

Chris Fonnesbeck
  • 4,143
  • 4
  • 29
  • 30
  • Hiya, Sorry for getting back to this so late. The alpha mean=2.473 is many standard deviations from true_alpha. If I run that model a second time, the alpha mean will be many standard deviations from the previous mean (and true_alpha). I expected that multiple runs would converge on true_alpha. This occurs even if the 'noise' is very small. Also, if I run the model where I observe 'flux' rather than 'flux_meas', multiple runs of the model always converge to alpha=true_alpha. – freddofrog Sep 18 '13 at 00:46