I am brand new to python and pymc3. So forgive me if this question is rather silly. I have a dataset called toy_data.csv:
batch_no | batch_id | points | pred1 | pred2 |
---|---|---|---|---|
12-150 | 1 | 1 | 70.26 | 10.766 |
12-150 | 1 | 2 | 65.72 | 10.512 |
12-150 | 1 | 6 | 55.44 | 7.78 |
12-150 | 1 | 4 | 63.554 | 12.552 |
12-150 | 1 | 5 | 61.66 | 10.95 |
12-150 | 1 | 3 | 65.44 | 10.808 |
12-190 | 2 | 3 | 55.68 | 6.36 |
12-190 | 2 | 4 | 67.36 | 7.38 |
12-190 | 2 | 1 | 78 | 9.32 |
12-190 | 2 | 2 | 70.12 | 8.554 |
12-190 | 3 | 5 | 60.68 | 9.568 |
12-190 | 3 | 3 | 68.694 | 9.634 |
12-190 | 3 | 2 | 56.118 | 11.474 |
12-190 | 3 | 4 | 58.54 | 9.31 |
12-190 | 3 | 6 | 65.08 | 10.604 |
13-999 | 4 | 6 | 56.73 | 9.754 |
13-999 | 4 | 3 | 55.492 | 9.4 |
13-999 | 4 | 2 | 51.556 | 9.802 |
13-999 | 4 | 1 | 53.748 | 8.85 |
13-999 | 4 | 5 | 59.054 | 9.204 |
13-999 | 4 | 4 | 49.77 | 9.468 |
13-999 | 4 | 7 | 58.266 | 9.954 |
13-999 | 4 | 8 | 57.78 | 9.38 |
14-140 | 5 | 2 | 69.68 | 12.086 |
14-140 | 5 | 1 | 68.5 | 10.438 |
14-140 | 5 | 6 | 71.7 | 11.624 |
14-140 | 5 | 3 | 63.68 | 11.058 |
14-140 | 5 | 4 | 62.02 | 10.498 |
14-140 | 5 | 5 | 61.94 | 10.95 |
14-140 | 5 | 9 | 57.22 | 10.164 |
14-140 | 5 | 7 | 54.44 | 9.798 |
14-140 | 5 | 8 | 60.82 | 10.784 |
14-290 | 6 | 1 | 56.2 | 9.566 |
14-290 | 6 | 1 | 50.06 | 9.23 |
14-290 | 6 | 2 | 50.76 | 10.646 |
14-290 | 6 | 2 | 46.98 | 8.38 |
14-700 | 7 | 2 | 92.8 | 11.532 |
14-700 | 7 | 1 | 81 | 9.522 |
14-700 | 7 | 3 | 75.62 | 10.004 |
14-700 | 8 | 2 | 71.44 | 10.076 |
14-889 | 8 | 1 | 55.2 | 8.1 |
14-889 | 8 | 3 | 71.18 | 9.412 |
14-889 | 8 | 4 | 53.24 | 7.138 |
14-900 | 9 | 3 | 50.08 | 7.744 |
14-900 | 9 | 1 | 47.138 | 8.294 |
14-900 | 9 | 4 | 68.56 | 11.362 |
14-900 | 9 | 1 | 69.34 | 11.584 |
14-900 | 9 | 2 | 63.14 | 10.77 |
I would like to grab the relevant batch & predictor from the hierarchical model trace to plot them and fit the abline line. The final output is to replicate the graph below but instead of the gray range, I'd like to show the abline for each batch.
For example:
Abline = y_best_fit_i=(mu_a+a_i)+[(mu_b+b_i)*pred_i]
with all parameters being backscaled using the mean and std from the respective x variable (pred1 or pred2). Considering only pred1, the formula for backscaling will be mu_a = (mu_a*pred1.std()) + pred1.mean()
; this will be applied to mu_b and all a_i and b_i values. However, backscaling as per pred1 mean and std dev will only apply to the trace from pred1, not pred2 (i.e. it will apply to mu_a, mu_b and a[0,0,0] and b[0,0,0] but not a[0,1,0] or b[0,1,0]). I would like to only apply to mu_a, mu_b , a[i, 0,0] and b[i, 0,0] where i supposed to be batch_no (see table below) and have it automated if possible. The table below is the summary of the tracing.
Would be good if I could find a way to get the numbers from trace itself to plot the final graph. Any help is greatly appreciated.
My code that I used to model is below:
import pandas as pd
import numpy as np
import pymc3 as pm
from sklearn import preprocessing
seed = 68492
np.random.seed(seed)
data = pd.read_csv("toy_data.csv")
# assigning variables
batch_no = data.iloc[:, 0]
batch_id = (data.iloc[:, 1])-1
point = data.iloc[:, 2]
pred = data.iloc[:, 3:5]
pred1 = data.iloc[:, 3]
pred2 = data.iloc[:, 4]
# creating indicators
n_pred = pred.shape[1]
n_batch = len(np.unique(batch_id))
# standarising variable
std_pred = preprocessing.scale(pred)
# Hierarchical model
with pm.Model() as hierarchical_model1:
# Hyperpriors
mu_a = pm.Normal('mu_a', mu=0, sigma=100)
sigma_a = pm.HalfNormal('sigma_a', sigma=100)
mu_b = pm.Normal('mu_b', mu=0, sigma=100)
sigma_b = pm.HalfNormal('sigma_b', sigma=100)
# Intercept for each batch, distributed around group mean mu_a
a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=(n_batch, n_pred, 1))
# Slope for each batcht, distributed around group mean mu_b
b = pm.Normal('b', mu=mu_b, sd=sigma_b, shape=(n_batch, n_pred, 1))
# Model error
eps = pm.HalfCauchy('eps', beta=5)
# Expected value
omega = (pm.math.dot(std_pred, b))[batch_id,1]
tpoint_est = a[batch_id,1] + omega
# Data likelihood
y_like = pm.Normal('y_like', mu=tpoint_est, sigma=eps, observed=point)
with hierarchical_model1:
hier_trace = pm.sample(draws = 4000, tune = 1000, chains = 2)
hier_out = pm.summary(hier_trace)