1

I'm attempting to locate a switchpoint and getting some extremely high values for my posteriors. Specifically lambda_1 and tau don't seem to make much sense. The dataset looks like this:

Graphed Data

I've been using a method similar to the cellphone data example found here: https://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_PyMC2.ipynb

My model looks like this:

with pm.Model() as model:
    alpha = 1.0/np.array(pos_df['positiveIncrease']).mean() 
                                
    lambda_1 = pm.Exponential("lambda_1", alpha)
    lambda_2 = pm.Exponential("lambda_2", alpha)
    
    tau = pm.DiscreteUniform("tau", lower=0, upper=n_date)
    
    idx = np.arange(n_date)
    lambda_ = pm.math.switch(tau > idx, lambda_1, lambda_2)
    observation = pm.Poisson("obs", lambda_, observed = pos_df['positiveIncrease'])
    
    step = pm.Metropolis()
    trace = pm.sample(10000, tune=5000, step=step)

when I run model.check_test_point() I get the following:

lambda_1_log__       -1.06
lambda_2_log__       -1.06
tau                  -5.03
obs              -26857.07
Name: Log-probability of test_point, dtype: float64

My lambda_2_samples are [61.56487732, 61.56487732, 60.23909822, ..., 61.21167046, 61.39722331, 61.39722331]

Where as my lambda_1_samples are [715.19559043, 715.19559043, 716.98035641, ..., 717.35203171, 717.35203171, 717.35203171]

Also my tau_samples are: ([125, 125, 125, ..., 125, 125, 125], dtype=int64)

My expectation is that the two distributions would fall somewhere within the dataset much like the following example:

Expected results

However, my results look like this mess:

Actual results

I've been blindly tweaking variables like sample size, tuning amount, and testvals but they don't seem to improve the results in any meaningful way. I would appreciate advise on how to fix the problem as well as information to help me better understand why it occurred in the first place.

blintster
  • 133
  • 1
  • 7
  • Try testing your code with synthetic data first. Draw from distribution A 100x, then distribution B 100x, then combine. Looking at your sample data, it's not clear it fits the model at all. I think that's what your results are telling you. – Pete Sep 03 '20 at 20:02

0 Answers0