1

I am trying to use tensorflow probability to predict the distribution of a time series in the future, improving my current model which is based on a classical time series model (GARCH). Essentially I stacked LSTM, one Flatten layer, Dense layers and at the end a DistributionLambda Layer with a StudentT distribution to predict 1344 steps into the future. Essentially I have 1374 distributions. I would expect tensorflow being able to capture the autoregressive behavior of the standard deviation and possibly any mean reversion properties in the price but none of the above happened.

def negative_log_likelihood(y, distr):
    return -distr.log_prob(y)

def student_dist(params):
    stuT = tfd.StudentT(loc=params[:,0:OUT_STEPS], 
                                        scale=1e-3 + tf.math.softplus(0.05 * params[:,OUT_STEPS:2 * OUT_STEPS]), 
                                        df =1e-3 + tf.math.softplus(0.05 * params[:,2 * OUT_STEPS:3 * OUT_STEPS]))                                            
    return stuT

nn1 = tf.keras.layers.Input(batch_input_shape=(128, 1374, 9))
nn2 = tf.keras.layers.LSTM(64, return_sequences=True)(nn1)
nn3 = tf.keras.layers.LSTM(32, return_sequences=True)(nn2)
nn4 = tf.keras.layers.Dropout(0.1)(nn3)
nn5 = tf.keras.layers.LSTM(16, return_sequences=True)(nn4)
nn6 = tf.keras.layers.Dropout(0.1)(nn5)
nn7 = tf.keras.layers.Flatten()(nn6)
nn8 = tf.keras.layers.Dense(OUT_STEPS*num_features, activation='relu')(nn7)
nn9 = tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.RandomNormal(stddev=0.5))(nn8)
nn10 = tf.keras.layers.Reshape([OUT_STEPS * num_features])(nn9)
nn11= tfp.layers.DistributionLambda(student_dist)(nn10)

multi_lstm_model = tf.keras.models.Model(nn1, nn11)

early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, mode='min', 
                                                restore_best_weights=True, min_delta= 10e-5)

model.compile(loss=negative_log_likelihood, optimizer=tf.keras.optimizers.Adam())

history = model.fit(genData(candles_1m, train_start, val_start, 14*24*4, 128), validation_data=genData(candles_1m, val_start, test_start, 14*24*4, 128),                      
                      steps_per_epoch=train_step_per_epoch, validation_steps=val_step_per_epoch, 
                      epochs=200, callbacks=[early_stopping])

However I keep getting the same input parameter for the distributionlambda layer for different inputs for every timestep.

new_model = tf.keras.Model(inputs=multi_lstm_model.input, outputs=multi_lstm_model.layers[-1].input)
prediction = pd.DataFrame(new_model.predict(genData(candles_1m, train_start, val_start, 14*24*4, 128), steps=5))
prediction.iloc[:, OUT_STEPS:3*OUT_STEPS] = tf.math.softplus(prediction.iloc[:, OUT_STEPS:3*OUT_STEPS] * 0.05).numpy()
prediction


array([[-0.4935108 , -0.29652068,  0.7733726 , ...,  2.814322  ,
         2.9786308 ,  2.915939  ],
       [-0.4935108 , -0.29652068,  0.7733726 , ...,  2.814322  ,
         2.9786308 ,  2.915939  ],
       [-0.4935108 , -0.29652068,  0.7733726 , ...,  2.814322  ,
         2.9786308 ,  2.915939  ],
       ...,
       [-0.4935108 , -0.29652068,  0.7733726 , ...,  2.814322  ,
         2.9786308 ,  2.915939  ],
       [-0.4935108 , -0.29652068,  0.7733726 , ...,  2.814322  ,
         2.9786308 ,  2.915939  ],
       [-0.4935108 , -0.29652068,  0.7733726 , ...,  2.814322  ,
         2.9786308 ,  2.915939  ]], dtype=float32)

I checked the input data. It is changing and providing different feature values and different target values on every timestep.

I tried wrapping the distribution layer in an Independent Distribution:

def student_dist(params):
    stuT = tfd.Independent(tfd.StudentT(loc=params[:,0:OUT_STEPS], 
                                        scale=1e-3 + tf.math.softplus(0.05 * params[:,OUT_STEPS:2 * OUT_STEPS]), 
                                        df =1e-3 + tf.math.softplus(0.05 * params[:,2 * OUT_STEPS:3 * OUT_STEPS])
                                        ))
    return stuT

I tried using a different negative likelihood function:

def negative_log_likelihood(y, dist_param):
    loc = dist_param[..., 0]
    scale = 1e-3 + tf.math.softplus(0.05 * dist_param[..., 1])
    df = 1e-3 + tf.math.softplus(0.05 * dist_param[..., 2])

    y = (y - loc) * (tf.math.rsqrt(df) / scale)
    log_unnormalized_prob = -0.5 * (df + 1.) * tfp.math.log1psquare(y)
    log_normalization = (tf.math.log(tf.abs(scale)) +
                         0.5 * tf.math.log(df) +
                         0.5 * np.log(np.pi) +
                         tfp.math.log_gamma_difference(0.5, 0.5 * df)
                         )
    out = -log_unnormalized_prob + log_normalization
    return out

None of it worked, I still get the same result for the entire time series. I checked if there was only one distribution was fitted but there were 1344 distributions on a batch size of 128 ((128, 1344)).

Has anyone encountered this issue? Anyone has an idea why this might be? I am at the end of my knowledge at this point. I have included a slimmed down Colab notebook, so you can see this issue for yourself. Usually I would expect the prediction to be different for nearer distributions and then converge for distributions further out. But the NN keeps returning the same values.

https://colab.research.google.com/drive/1zLldX1446ULcdgUiPf-TZKEeVnD4YY1s?usp=sharing

I used these resources to build this model:

https://www.tensorflow.org/tutorials/structured_data/time_series

TensorFlow Probability - want NN to output multiple distributions

https://www.tensorflow.org/probability/examples/Understanding_TensorFlow_Distributions_Shapes

desertnaut
  • 57,590
  • 26
  • 140
  • 166
JonathanSchmied
  • 137
  • 1
  • 7

1 Answers1

0

Not sure, but I'm wondering if creating new_model in that way might re-initialise the weights, so you're doing your prediction on an untrained model? You can run .predict() on the model you've trained, no need to create a new one.

But then what have you trained? You've created a model called 'multi_lstm_model', and then compiled and trained a model called 'model'

David Harris
  • 646
  • 3
  • 11
  • Hey, first thanks for the input. So I am creating a new model in order to get the inputs to the distributionlambda layer. otherwise the neural network would only output samples of said distribution instead of the parameters of that distribution. This is motivated by this stackoverflow post: https://stackoverflow.com/questions/63479067/easiest-way-to-see-the-output-of-a-hidden-layer-in-tensorflow-keras – JonathanSchmied Aug 10 '22 at 13:12
  • Apologies, I was wrong to suggest that creating new_model might re-initialise the weights - it doesn't, and I think that's not your problem – David Harris Aug 10 '22 at 15:30
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 14 '22 at 12:36