3

Can we use Bahdanau attention for multivariate time-series prediction problem? Using the Bahdanau implementation from here, I have come up with following code for time series prediction.

    from tensorflow.keras.layers import Input, LSTM, Concatenate, Flatten
    from attention_keras import AttentionLayer
    from tensorflow.keras import Model

    num_inputs = 5
    seq_length = 10
    inputs = Input(shape=(seq_length, num_inputs), name='inputs')
    lstm_out = LSTM(64, return_sequences=True)(inputs)
    lstm_out = LSTM(64, return_sequences=True)(lstm_out)

    # Attention layer
    attn_layer = AttentionLayer(name='attention_layer')
    attn_out, attn_states = attn_layer([lstm_out, lstm_out])

    # Concat attention input and LSTM output, in original code it was decoder LSTM
    concat_out = Concatenate(axis=-1, name='concat_layer')([lstm_out, attn_out])
    flat_out = Flatten()(concat_out)

    # Dense layer
    dense_out = Dense(seq_length, activation='relu')(flat_out)
    predictions= dense_time(1)(dense_out)

    # Full model
    full_model = Model(inputs=inputs, outputs=predictions)
    full_model.compile(optimizer='adam', loss='mse')

For my data, the model does perform better than vanilla LSTM without attention, but I am not sure if this implementation make sense or not?

Ather Cheema
  • 416
  • 2
  • 14
  • Why are you passing `lstm_out` twice in the `attn_layer`? – bcsta Oct 04 '20 at 15:00
  • because the `AttentionLayer` requires two inputs i.e. a list of two tensors. The example in in the repository [here](https://github.com/thushv89/attention_keras/blob/master/src/examples/nmt/model.py#L30) shows how to use `AttentionLayer`. – Ather Cheema Oct 04 '20 at 15:05
  • yes, but in the example, the inputs are `encoder_out` and `decoder_out`. In your case you are passing the same value. you have two `lstm_out` assignments. That does not mean you have two different values for `lstm_out` but the second assignment of `lstm_out` is overwriting the first one. – bcsta Oct 04 '20 at 15:07
  • 1
    yes the second `lstm_out` is overwriting first `lstm_out`. There is no specific reason to use two back to back LSTMs here. The reason I am feeding the same `lstm_out` as a list of two, to `AttentionLayer` is because I could not find any other way. If you think this is wrong, please suggest and that is precisely the question. – Ather Cheema Oct 04 '20 at 15:32

0 Answers0