Can we use Bahdanau attention for multivariate time-series prediction problem? Using the Bahdanau implementation from here, I have come up with following code for time series prediction.
from tensorflow.keras.layers import Input, LSTM, Concatenate, Flatten
from attention_keras import AttentionLayer
from tensorflow.keras import Model
num_inputs = 5
seq_length = 10
inputs = Input(shape=(seq_length, num_inputs), name='inputs')
lstm_out = LSTM(64, return_sequences=True)(inputs)
lstm_out = LSTM(64, return_sequences=True)(lstm_out)
# Attention layer
attn_layer = AttentionLayer(name='attention_layer')
attn_out, attn_states = attn_layer([lstm_out, lstm_out])
# Concat attention input and LSTM output, in original code it was decoder LSTM
concat_out = Concatenate(axis=-1, name='concat_layer')([lstm_out, attn_out])
flat_out = Flatten()(concat_out)
# Dense layer
dense_out = Dense(seq_length, activation='relu')(flat_out)
predictions= dense_time(1)(dense_out)
# Full model
full_model = Model(inputs=inputs, outputs=predictions)
full_model.compile(optimizer='adam', loss='mse')
For my data, the model does perform better than vanilla LSTM without attention, but I am not sure if this implementation make sense or not?