I am new to PYMC3. Maybe this is a naive question, but I searched a lot and didn't find any explanation on this issue. Basically, I want to do a linear regression in PYMC3, however the training is very slow and the model performance on training set is very poor as well. Below is my code:
X_Tr = np.array([ 13.99802212, 13.8512075 , 13.9531636 , 13.97432944,
13.89211468, 13.91357953, 13.95987483, 13.86476587,
13.9501789 , 13.92698143, 13.9653932 , 14.06663115,
13.91697969, 13.99629862, 14.01392784, 13.96495713,
13.98697998, 13.97516973, 14.01048397, 14.05918188,
14.08342002, 13.89350606, 13.81768849, 13.94942447,
13.90465027, 13.93969029, 14.18771189, 14.08631113,
14.03718829, 14.01836206, 14.06758363, 14.05243539,
13.96287123, 13.93011351, 14.01616973, 14.01923812,
13.97424024, 13.9587175 , 13.85669845, 13.97778302,
14.04192138, 13.93775494, 13.86693585, 13.79985956,
13.82679677, 14.06474544, 13.90821822, 13.71648423,
13.78899668, 13.76857337, 13.87201756, 13.86152949,
13.80447525, 13.99609891, 14.0210165 , 13.986906 ,
13.97479211, 14.04562055, 14.03293095, 14.15178043,
14.32413197, 14.2330354 , 13.99247751, 13.92962912,
13.95394525, 13.87888254, 13.82743111, 14.10724699,
14.23638905, 14.15731881, 14.13239278, 14.13386722,
13.91442452, 14.01056255, 14.19378649, 14.22233852,
14.30405399, 14.25880108, 14.23985258, 14.21184303,
14.4443183 , 14.55710331, 14.42102092, 14.29047616,
14.43712609, 14.58666212])
y_tr = np.array([ 13.704, 13.763, 13.654, 13.677, 13.66 , 13.735, 13.845,
13.747, 13.747, 13.606, 13.819, 13.867, 13.817, 13.68 ,
13.823, 13.779, 13.814, 13.936, 13.956, 13.912, 13.982,
13.979, 13.919, 13.944, 14.094, 13.983, 13.887, 13.902,
13.899, 13.881, 13.784, 13.909, 13.99 , 14.06 , 13.834,
13.778, 13.703, 13.965, 14.02 , 13.992, 13.927, 14.009,
13.988, 14.022, 13.754, 13.837, 13.91 , 13.907, 13.867,
14.014, 13.952, 13.796, 13.92 , 14.051, 13.773, 13.837,
13.745, 14.034, 13.923, 14.041, 14.077, 14.125, 13.989,
14.174, 13.967, 13.952, 14.024, 14.171, 14.175, 14.091,
14.267, 14.22 , 14.071, 14.112, 14.174, 14.289, 14.146,
14.356, 14.5 , 14.265, 14.259, 14.406, 14.463, 14.473,
14.413, 14.507])
sns.regplot(x=X_tr, y=y_tr.flatten());
Here I use PYMC3 to train the model:
shA_X = shared(X_tr)
with pm.Model() as linear_model:
alpha = pm.Normal("alpha", mu=14,sd=100)
betas = pm.Normal("betas", mu=0, sd=100, shape=1)
sigma = pm.HalfCauchy('sigma', beta=10, testval=1.)
mu = alpha + betas * shA_X
forecast = pm.Normal("forecast", mu=mu, sd=sigma, observed=y_tr)
step = pm.NUTS()
trace=pm.sample(3000, tune=1000)
and then check the performance:
ppc_w = pm.sample_ppc(trace, 1000, linear_model,
progressbar=False)
plt.plot(ppc_w['forecast'].mean(axis=0),'r')
plt.plot(y_tr, color='k')`
Why is the prediction on the training set so poor? Any suggestions and ideas are appreciated.