PyMC3 - Index 2-dimensional data while fitting hierarchical autoregressive model

Question

I (new to PyMC3) want to extend the model proposed in the PyMC3 example A Hierarchical model for Rugby prediction by making the latent variables for attack and defence strength autoregressive. I am unsure about how to use 2-dimensional data and the shape parameter of the GaussianRandomWalk class (explained below).

Edit 1: I did not find explicit documentation of multi-dimensional usage, but I found this comment by fonnesbeck among the PyMC3 github issues:

[...] I think most people would expect a vector of variables, which implies that the first dimension is the number of variable elements and the remaining dimension(s) the size of each variable.

As defined below, I use the time index as my 1st dimension. I tried to switch axis, which yields the same result. So, my current model is:

with pm.Model() as model:
    home = pm.Normal('home', 0, .0001)
    intercept = pm.Normal('intercept', 0, .0001)
    tau_att =  pm.Exponential('tau_att', 1./.02)
    tau_def =  pm.Exponential('tau_def', 1./.02)

    atts = pm.GaussianRandomWalk('atts', tau_att**-2, shape=[T, num_teams])
    defs = pm.GaussianRandomWalk('defs', tau_def**-2, shape=[T, num_teams])

    home_theta  = tt.exp(intercept + home + atts[:, home_team] + defs[:, away_team])
    away_theta  = tt.exp(intercept + atts[:, away_team] + defs[:, home_team])

    home_points = pm.Poisson('home_points', mu=home_theta, observed=observed_home_goals)
    away_points = pm.Poisson('away_points', mu=away_theta, observed=observed_away_goals)

The input is a two dimensional array, with rows being the time steps and columns containing the home or away goals for all teams that played in that time step. Assuming the following mock data

   home_score  away_score  home_team  away_team  i_home  i_away  t
0           1           0    Arsenal  Liverpool       0       1  1
1           1           1  Liverpool    Burnley       1       2  1
2           2           4    Burnley    Arsenal       2       0  1
3           0           3  Liverpool    Arsenal       1       0  2
4           1           1    Burnley  Liverpool       2       1  2
5           5           0    Arsenal    Burnley       0       2  2

observed_home_goals (similar to observed_away_goals) would look like this:

[[1 1 2]
 [0 1 5]]

and the corresponding team index would look like this:

[[0 1 2]
 [1 2 0]]

meaning that in time step 1, teams [0 1 2] shot [1 1 2] goals respectively.

Fitting the model does not throw errors, sampling however yields zero estimates for all parameters. I tried to browse the distributions/timeseries.py source code about how the shape parameter would be used for multiple dimensions in the GaussianRandomWalk class.

My question is, if the model definition would actually work as intended for the 2-dimensional time series data. I am not sure, if I index the atts and defs variables correctly.

Edit 2: I ended up building the time series manually, similar to Javier's solution, which seems to work fine!

PyMC3 - Index 2-dimensional data while fitting hierarchical autoregressive model

0 Answers0