0

I would like to estimate the parameters of a simple linear function and a gamma-distributed noise term from data. (Note: This is a follow-up question of https://stats.stackexchange.com/questions/88676/regression-with-unidirectional-noise, but simplified and more implementation-specific). Say I have my observed data generated as follows:

import numpy as np
np.random.seed(0)

size = 200
true_intercept = 1
true_slope = 2

# Generate observed data
x_ = np.linspace(0, 1, size)
true_regression_line = true_intercept + true_slope * x_  # y = a + b*x
noise_ = np.random.gamma(shape=1.0, scale=1.0, size=size)
y_ = true_regression_line + noise_

which looks as follows: enter image description here

I tried estimating these parameters using pymc as follows:

from pymc import Normal, Gamma, Uniform, Model, MAP
# Define priors
intercept = Normal('intercept', 0, tau=0.1)
slope = Normal('slope', 0, tau=0.1)
alpha = Uniform('alpha', 0, 2)
beta = Uniform('beta', 0, 2)
noise = Gamma('noise', alpha=alpha, beta=beta, size=size)

# Give likelihood > 0 to models where the regression line becomes larger than
# any of the datapoint
y = Normal('y', mu=intercept + slope * x_ + noise, tau=100,
           observed=True, value=y_)

# Perform MAP fit of model
model = Model([alpha, beta, intercept, slope, noise])
map_ = MAP(model)
map_.fit()

However, this gives me estimates that are far off the true values:

  • Intercept: True: 1.000, Estimated: 3.281
  • Slope: True: 2.000, Estimated: -3.400

Am I doing something wrongly?

frisbee
  • 73
  • 1
  • 6

1 Answers1

0

You seem to be specifying a Normal likelihood as well as the Gamma noise, so you are adding additional Gaussian noise to the model, which does not seem to be warranted. Try expressing the likelihood as a Gamma, rather than a Normal, since this is the distribution of the residuals.

Chris Fonnesbeck
  • 4,143
  • 4
  • 29
  • 30
  • The problem with this would be that for some values of slope and intercept the likelihood would become 0 and also that some of regularity conditions for maximum likelihood estimation would be violated, see answer here: https://stats.stackexchange.com/questions/88676/regression-with-unidirectional-noise – frisbee Mar 06 '14 at 14:21
  • The residuals in the fit you have drawn above are not Gaussian, but that's what you've specified in your PyMC model. You should not need to worry about regularity conditions, since you are not doing ML estimation. All you need to worry about is making your posterior proper. You can use the priors to appropriately constrain the model. – Chris Fonnesbeck Mar 07 '14 at 03:10