PYMC3 Switchpoint Analysis

Question

I'm attempting to locate a switchpoint and getting some extremely high values for my posteriors. Specifically lambda_1 and tau don't seem to make much sense. The dataset looks like this:

I've been using a method similar to the cellphone data example found here: https://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_PyMC2.ipynb

My model looks like this:

with pm.Model() as model:
    alpha = 1.0/np.array(pos_df['positiveIncrease']).mean() 
                                
    lambda_1 = pm.Exponential("lambda_1", alpha)
    lambda_2 = pm.Exponential("lambda_2", alpha)
    
    tau = pm.DiscreteUniform("tau", lower=0, upper=n_date)
    
    idx = np.arange(n_date)
    lambda_ = pm.math.switch(tau > idx, lambda_1, lambda_2)
    observation = pm.Poisson("obs", lambda_, observed = pos_df['positiveIncrease'])
    
    step = pm.Metropolis()
    trace = pm.sample(10000, tune=5000, step=step)

when I run model.check_test_point() I get the following:

lambda_1_log__       -1.06
lambda_2_log__       -1.06
tau                  -5.03
obs              -26857.07
Name: Log-probability of test_point, dtype: float64

My lambda_2_samples are [61.56487732, 61.56487732, 60.23909822, ..., 61.21167046, 61.39722331, 61.39722331]

Where as my lambda_1_samples are [715.19559043, 715.19559043, 716.98035641, ..., 717.35203171, 717.35203171, 717.35203171]

Also my tau_samples are: ([125, 125, 125, ..., 125, 125, 125], dtype=int64)

My expectation is that the two distributions would fall somewhere within the dataset much like the following example:

However, my results look like this mess:

I've been blindly tweaking variables like sample size, tuning amount, and testvals but they don't seem to improve the results in any meaningful way. I would appreciate advise on how to fix the problem as well as information to help me better understand why it occurred in the first place.

Try testing your code with synthetic data first. Draw from distribution A 100x, then distribution B 100x, then combine. Looking at your sample data, it's not clear it fits the model at all. I think that's what your results are telling you. — Pete, Sep 03 '20 at 20:02

PYMC3 Switchpoint Analysis

0 Answers0