I am trying to use tensorflow probability to predict the distribution of a time series in the future, improving my current model which is based on a classical time series model (GARCH). Essentially I stacked LSTM, one Flatten layer, Dense layers and at the end a DistributionLambda Layer with a StudentT distribution to predict 1344 steps into the future. Essentially I have 1374 distributions. I would expect tensorflow being able to capture the autoregressive behavior of the standard deviation and possibly any mean reversion properties in the price but none of the above happened.
def negative_log_likelihood(y, distr):
return -distr.log_prob(y)
def student_dist(params):
stuT = tfd.StudentT(loc=params[:,0:OUT_STEPS],
scale=1e-3 + tf.math.softplus(0.05 * params[:,OUT_STEPS:2 * OUT_STEPS]),
df =1e-3 + tf.math.softplus(0.05 * params[:,2 * OUT_STEPS:3 * OUT_STEPS]))
return stuT
nn1 = tf.keras.layers.Input(batch_input_shape=(128, 1374, 9))
nn2 = tf.keras.layers.LSTM(64, return_sequences=True)(nn1)
nn3 = tf.keras.layers.LSTM(32, return_sequences=True)(nn2)
nn4 = tf.keras.layers.Dropout(0.1)(nn3)
nn5 = tf.keras.layers.LSTM(16, return_sequences=True)(nn4)
nn6 = tf.keras.layers.Dropout(0.1)(nn5)
nn7 = tf.keras.layers.Flatten()(nn6)
nn8 = tf.keras.layers.Dense(OUT_STEPS*num_features, activation='relu')(nn7)
nn9 = tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.RandomNormal(stddev=0.5))(nn8)
nn10 = tf.keras.layers.Reshape([OUT_STEPS * num_features])(nn9)
nn11= tfp.layers.DistributionLambda(student_dist)(nn10)
multi_lstm_model = tf.keras.models.Model(nn1, nn11)
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, mode='min',
restore_best_weights=True, min_delta= 10e-5)
model.compile(loss=negative_log_likelihood, optimizer=tf.keras.optimizers.Adam())
history = model.fit(genData(candles_1m, train_start, val_start, 14*24*4, 128), validation_data=genData(candles_1m, val_start, test_start, 14*24*4, 128),
steps_per_epoch=train_step_per_epoch, validation_steps=val_step_per_epoch,
epochs=200, callbacks=[early_stopping])
However I keep getting the same input parameter for the distributionlambda layer for different inputs for every timestep.
new_model = tf.keras.Model(inputs=multi_lstm_model.input, outputs=multi_lstm_model.layers[-1].input)
prediction = pd.DataFrame(new_model.predict(genData(candles_1m, train_start, val_start, 14*24*4, 128), steps=5))
prediction.iloc[:, OUT_STEPS:3*OUT_STEPS] = tf.math.softplus(prediction.iloc[:, OUT_STEPS:3*OUT_STEPS] * 0.05).numpy()
prediction
array([[-0.4935108 , -0.29652068, 0.7733726 , ..., 2.814322 ,
2.9786308 , 2.915939 ],
[-0.4935108 , -0.29652068, 0.7733726 , ..., 2.814322 ,
2.9786308 , 2.915939 ],
[-0.4935108 , -0.29652068, 0.7733726 , ..., 2.814322 ,
2.9786308 , 2.915939 ],
...,
[-0.4935108 , -0.29652068, 0.7733726 , ..., 2.814322 ,
2.9786308 , 2.915939 ],
[-0.4935108 , -0.29652068, 0.7733726 , ..., 2.814322 ,
2.9786308 , 2.915939 ],
[-0.4935108 , -0.29652068, 0.7733726 , ..., 2.814322 ,
2.9786308 , 2.915939 ]], dtype=float32)
I checked the input data. It is changing and providing different feature values and different target values on every timestep.
I tried wrapping the distribution layer in an Independent Distribution:
def student_dist(params):
stuT = tfd.Independent(tfd.StudentT(loc=params[:,0:OUT_STEPS],
scale=1e-3 + tf.math.softplus(0.05 * params[:,OUT_STEPS:2 * OUT_STEPS]),
df =1e-3 + tf.math.softplus(0.05 * params[:,2 * OUT_STEPS:3 * OUT_STEPS])
))
return stuT
I tried using a different negative likelihood function:
def negative_log_likelihood(y, dist_param):
loc = dist_param[..., 0]
scale = 1e-3 + tf.math.softplus(0.05 * dist_param[..., 1])
df = 1e-3 + tf.math.softplus(0.05 * dist_param[..., 2])
y = (y - loc) * (tf.math.rsqrt(df) / scale)
log_unnormalized_prob = -0.5 * (df + 1.) * tfp.math.log1psquare(y)
log_normalization = (tf.math.log(tf.abs(scale)) +
0.5 * tf.math.log(df) +
0.5 * np.log(np.pi) +
tfp.math.log_gamma_difference(0.5, 0.5 * df)
)
out = -log_unnormalized_prob + log_normalization
return out
None of it worked, I still get the same result for the entire time series. I checked if there was only one distribution was fitted but there were 1344 distributions on a batch size of 128 ((128, 1344)).
Has anyone encountered this issue? Anyone has an idea why this might be? I am at the end of my knowledge at this point. I have included a slimmed down Colab notebook, so you can see this issue for yourself. Usually I would expect the prediction to be different for nearer distributions and then converge for distributions further out. But the NN keeps returning the same values.
https://colab.research.google.com/drive/1zLldX1446ULcdgUiPf-TZKEeVnD4YY1s?usp=sharing
I used these resources to build this model:
https://www.tensorflow.org/tutorials/structured_data/time_series
TensorFlow Probability - want NN to output multiple distributions
https://www.tensorflow.org/probability/examples/Understanding_TensorFlow_Distributions_Shapes