6

I'm trying to adapt the text message example from Cameron Davidson-Pilon's Bayesian Methods for Hackers, Chapter 1, "Introducing our first hammer: PyMC" to handle multiple observations. The solution below appears to be working, but I'm new to pymc and I'm not sure that this is a good way to handle multiple time series observations in pymc. Any advice would be greatly appreciated!

To re-cap the text message example from Bayesian Methods for Hackers, the observations consist of 74 days of text message counts as pictured below.

enter image description here

The book models this process using a switchpoint parameter (tau) and two exponential parameters (lambda1 and lambda2) which control the Poisson-distributed message count before and after tau, respectively. For this example pymc yields solutions of approximately: tau=45, lambda1=18 and lambda2=23 using the following code, which is almost identical to the book's code:

import numpy as np
import pymc

observation = np.loadtxt( './txtdata.csv' ) #data available at the book's GitHub site
n_days      = observation.size    #number of days
alpha       = 1./20  #assume a mean of 20 messages per day
lambda1     = pymc.Exponential("lambda1", alpha)
lambda2     = pymc.Exponential("lambda2", alpha)
tau         = pymc.DiscreteUniform("tau", lower=0, upper=n_days)

@pymc.deterministic
def lambda_(tau=tau, lambda1=lambda1, lambda2=lambda2):
    a       = np.zeros(n_days)
    a[:tau] = lambda1
    a[tau:] = lambda2
    return a
observation_model  = pymc.Poisson("observation", lambda_, value=observation, observed=True)

model   = pymc.Model([observation_model, tau, lambda1, lambda2])
mcmc    = pymc.MCMC(model)
mcmc.sample(40000, 10000)

print()
print( mcmc.trace('tau')[:].mean() )
print( mcmc.trace('lambda1')[:].mean() )
print( mcmc.trace('lambda2')[:].mean() )

My question is: how should this be adapted to handle multiple observations?

My solution appears below, and seems to be working, but I'd like to know if there is a better way to model the problem in pymc.

First I generate five random observations using tau=45, lambda1=18 and lambda2=23 as follows:

n_observations = 5
n_days  = 74
alpha   = 1./20
lambda1 = pymc.Exponential("lambda1", alpha)
lambda2 = pymc.Exponential("lambda2", alpha)
tau     = pymc.DiscreteUniform("tau", lower=0, upper=n_days)

@pymc.deterministic
def lambda_single(tau=tau, lambda1=lambda1, lambda2=lambda2):
    a       = np.zeros(n_days)
    a[:tau] = lambda1
    a[tau:] = lambda2
    return a

observation_generator  = pymc.Poisson("observation_generator", lambda_single)
tau.set_value(45)
lambda1.set_value(18)
lambda2.set_value(23)
n_observations = 5
observations = np.array(  [observation_generator.random() for i in range(n_observations)]  )

Running the code above yields a (5 x 74) "observations" array representing five different people's data, for example, over 74 days, as depicted below.

enter image description here

The next step is the part I'm not sure of: how should these five observations be modeled in pymc? Here's what I have:

@pymc.deterministic
def lambda_multiple(tau=tau, lambda1=lambda1, lambda2=lambda2):
    a = np.zeros( (n_observations, n_days) )
    a[:, :tau] = lambda1
    a[:, tau:] = lambda2
    return a

observation_model  = pymc.Poisson("observations", lambda_multiple, value=observations, observed=True)

Running this model appears to produce the expected results for tau, lambda1 and lambda2, but I wonder if this is an appropriate way to handle multiple observations?

sophros
  • 14,672
  • 11
  • 46
  • 75
ToddP
  • 652
  • 13
  • 18
  • Have you figured this out yet? I am trying to replicate this with data on a multivariate model – Kevin Pei Jun 06 '17 at 05:22
  • No, not yet! I was hoping to get some feedback but I'm afraid that no one has offered any advice yet. To me the approach seems reasonable, but I also suspect that there are alternative models to represent datasets like these. – ToddP Jun 06 '17 at 10:53
  • See this [thread](https://github.com/pymc-devs/pymc3/issues/535), do you still get the same sampled results? This worked for me while your method (for me a 3d matrix) did not. – Kevin Pei Jun 08 '17 at 18:37
  • I don't think that thread is directly relevant because the estimated parameters (tau, lambda1, lambda2) are not multivariate, and there is no need to model covariance across days. The goal is just to estimate the three univariate values (tau, lambda1, lambda2) based on N separate observations, where one observation is a single 74-day time series. When N=1 the problem is identical to the one in Bayesian Methods for Hackers, Chapter 1, but when N>1 I'm not sure whether the model should be modified. – ToddP Jun 08 '17 at 21:31

0 Answers0