I'm trying to adapt the text message example from Cameron Davidson-Pilon's Bayesian Methods for Hackers, Chapter 1, "Introducing our first hammer: PyMC" to handle multiple observations. The solution below appears to be working, but I'm new to pymc and I'm not sure that this is a good way to handle multiple time series observations in pymc. Any advice would be greatly appreciated!
To re-cap the text message example from Bayesian Methods for Hackers, the observations consist of 74 days of text message counts as pictured below.
The book models this process using a switchpoint parameter (tau) and two exponential parameters (lambda1 and lambda2) which control the Poisson-distributed message count before and after tau, respectively. For this example pymc yields solutions of approximately: tau=45, lambda1=18 and lambda2=23 using the following code, which is almost identical to the book's code:
import numpy as np
import pymc
observation = np.loadtxt( './txtdata.csv' ) #data available at the book's GitHub site
n_days = observation.size #number of days
alpha = 1./20 #assume a mean of 20 messages per day
lambda1 = pymc.Exponential("lambda1", alpha)
lambda2 = pymc.Exponential("lambda2", alpha)
tau = pymc.DiscreteUniform("tau", lower=0, upper=n_days)
@pymc.deterministic
def lambda_(tau=tau, lambda1=lambda1, lambda2=lambda2):
a = np.zeros(n_days)
a[:tau] = lambda1
a[tau:] = lambda2
return a
observation_model = pymc.Poisson("observation", lambda_, value=observation, observed=True)
model = pymc.Model([observation_model, tau, lambda1, lambda2])
mcmc = pymc.MCMC(model)
mcmc.sample(40000, 10000)
print()
print( mcmc.trace('tau')[:].mean() )
print( mcmc.trace('lambda1')[:].mean() )
print( mcmc.trace('lambda2')[:].mean() )
My question is: how should this be adapted to handle multiple observations?
My solution appears below, and seems to be working, but I'd like to know if there is a better way to model the problem in pymc.
First I generate five random observations using tau=45, lambda1=18 and lambda2=23 as follows:
n_observations = 5
n_days = 74
alpha = 1./20
lambda1 = pymc.Exponential("lambda1", alpha)
lambda2 = pymc.Exponential("lambda2", alpha)
tau = pymc.DiscreteUniform("tau", lower=0, upper=n_days)
@pymc.deterministic
def lambda_single(tau=tau, lambda1=lambda1, lambda2=lambda2):
a = np.zeros(n_days)
a[:tau] = lambda1
a[tau:] = lambda2
return a
observation_generator = pymc.Poisson("observation_generator", lambda_single)
tau.set_value(45)
lambda1.set_value(18)
lambda2.set_value(23)
n_observations = 5
observations = np.array( [observation_generator.random() for i in range(n_observations)] )
Running the code above yields a (5 x 74) "observations" array representing five different people's data, for example, over 74 days, as depicted below.
The next step is the part I'm not sure of: how should these five observations be modeled in pymc? Here's what I have:
@pymc.deterministic
def lambda_multiple(tau=tau, lambda1=lambda1, lambda2=lambda2):
a = np.zeros( (n_observations, n_days) )
a[:, :tau] = lambda1
a[:, tau:] = lambda2
return a
observation_model = pymc.Poisson("observations", lambda_multiple, value=observations, observed=True)
Running this model appears to produce the expected results for tau, lambda1 and lambda2, but I wonder if this is an appropriate way to handle multiple observations?