1

I want to use Hawkes process to model some data. I could not find whether PyMC supports Hawkes process. More specifically I want an observed variable with Hawkes Process and learn a posterior on its params.

If it is not there, then could I define it in PyMC in some way e.g. @deterministic etc.??

Harit Vishwakarma
  • 2,338
  • 4
  • 21
  • 36

1 Answers1

3

It's been quite a long time since your question, but I've worked it out on PyMC today so I'd thought I'd share the gist of my implementation for the other people who might get across the same problem. We're going to infer the parameters λ and α of a Hawkes process. I'm not going to cover the temporal scale parameter β, I'll leave that as an exercise for the readers.

First let's generate some data :

def hawkes_intensity(mu, alpha, points, t):
    p = np.array(points)
    p = p[p <= t]
    p = np.exp(p - t)
    return mu + alpha * np.sum(p)


def simulate_hawkes(mu, alpha, window):
    t = 0
    points = []
    lambdas = []
    while t < window:
        m = hawkes_intensity(mu, alpha, points, t)
        s = np.random.exponential(scale=1/m)
        ratio = hawkes_intensity(mu, alpha, points, t + s)

        t = t + s
        if t < window:
            points.append(t)
            lambdas.append(ratio)
        else:
            break
    points = np.sort(np.array(points, dtype=np.float32))
    lambdas = np.array(lambdas, dtype=np.float32)
    return points, lambdas


# parameters
window = 1000
mu = 8
alpha = 0.25
points, lambdas = simulate_hawkes(mu, alpha, window)
num_points = len(points)

We just generated some temporal points using some functions that I adapted from there : https://nbviewer.jupyter.org/github/MatthewDaws/PointProcesses/blob/master/Temporal%20points%20processes.ipynb

Now, the trick is to create a matrix of size (num_points, num_points) that contains the temporal distance of the ith point from all the other points. So the (i, j) point of the matrix is the temporal interval separating the ith point to the jth. This matrix will be used to compute the sum of the exponentials of the Hawkes process, ie. the self-exciting part. The way to create this matrix as well as the sum of the exponentials is a bit tricky. I'd recommend to check every line yourself so you can see what they do.

tile = np.tile(points, num_points).reshape(num_points, num_points)
tile = np.clip(points[:, None] - tile, 0, np.inf)
tile = np.tril(np.exp(-tile), k=-1)
Σ = np.sum(tile, axis=1)[:-1]  # this is our self-exciting sum term

We have points and we have a matrix containg the sum of the excitations term. The duration between two consecutive events of a Hawkes process follow an exponential distribution of parameter λ = λ0 + ∑ excitation. This is what we are going to model, but first we have to compute the duration between two consecutive points of our generated data.

interval = points[1:] - points[:-1]

We're now ready for inference:

with pm.Model() as model:
    λ = pm.Exponential("λ", 1)
    α = pm.Uniform("α", 0, 1)

    lam = pm.Deterministic("lam", λ + α * Σ)
    interarrival = pm.Exponential(
        "interarrival", lam, observed=interval)


    trace = pm.sample(2000, tune=4000)
    pm.plot_posterior(trace, var_names=["λ", "α"])
    plt.show()

    print(np.mean(trace["λ"]))
    print(np.mean(trace["α"]))

7.829 0.284

Note: the tile matrix can become quite large if you have many data points.

Kerighan
  • 328
  • 3
  • 8