4

I am building a basic seq2seq autoencoder, but I'm not sure if I'm doing it correctly.

model = Sequential()
# Encoder       
model.add(LSTM(32, activation='relu', input_shape =(timesteps, n_features ), return_sequences=True))
model.add(LSTM(16, activation='relu', return_sequences=False))
model.add(RepeatVector(timesteps))
# Decoder
model.add(LSTM(16, activation='relu', return_sequences=True))
model.add(LSTM(32, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(n_features)))'''

The model is then fit using a batch size parameter

model.fit(data, data,       
          epochs=30, 
          batch_size = 32)

The model is compiled with the mse loss function and seems to learn.

To get the encoder output for the test data, I am using a K function:

get_encoder_output = K.function([model.layers[0].input],
                                  [model.layers[1].output])

encoder_output = get_encoder_output([test_data])[0]

My first question is whether the model is specified correctly. In particular whether the RepeatVector layer is needed. I'm not sure what it is doing. What if I omit it and specify the preceding layer with return_sequences = True?

My second question is whether I need to tell get_encoder_output about the batch_size used in training?

Thanks in advance for any help on either question.

Garry
  • 179
  • 13

2 Answers2

3

This might prove useful to you:

As a toy problem I created a seq2seq model for predicting the continuation of different sine waves.

This was the model:

def create_seq2seq():
    features_num=5 
    latent_dim=40

    ##
    encoder_inputs = Input(shape=(None, features_num))
    encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoder_inputs)
    encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
    encoded = LSTM(latent_dim, return_state=False ,return_sequences=True)(encoded)
    encoded = LSTM(latent_dim, return_state=True)(encoded)

    encoder = Model (input=encoder_inputs, output=encoded)
    ##

    encoder_outputs, state_h, state_c = encoder(encoder_inputs)
    encoder_states = [state_h, state_c]

    decoder_inputs=Input(shape=(1, features_num))
    decoder_lstm_1 = LSTM(latent_dim, return_sequences=True, return_state=True)
    decoder_lstm_2 = LSTM(latent_dim, return_sequences=True, return_state=True)
    decoder_lstm_3 = LSTM(latent_dim, return_sequences=True, return_state=True)
    decoder_lstm_4 = LSTM(latent_dim, return_sequences=True, return_state=True)

    decoder_dense = Dense(features_num)

    all_outputs = []
    inputs = decoder_inputs


    states_1=encoder_states
    # Placeholder values:
    states_2=states_1; states_3=states_1; states_4=states_1
    ###

    for _ in range(1):
        # Run the decoder on the first timestep
        outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
        outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1)
        outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2)
        outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3)

        # Store the current prediction (we will concatenate all predictions later)
        outputs = decoder_dense(outputs_4)
        all_outputs.append(outputs)
        # Reinject the outputs as inputs for the next loop iteration
        # as well as update the states
        inputs = outputs
        states_1 = [state_h_1, state_c_1]
        states_2 = [state_h_2, state_c_2]
        states_3 = [state_h_3, state_c_3]
        states_4 = [state_h_4, state_c_4]


    for _ in range(149):
        # Run the decoder on each timestep
        outputs_1, state_h_1, state_c_1 = decoder_lstm_1(inputs, initial_state=states_1)
        outputs_2, state_h_2, state_c_2 = decoder_lstm_2(outputs_1, initial_state=states_2)
        outputs_3, state_h_3, state_c_3 = decoder_lstm_3(outputs_2, initial_state=states_3)
        outputs_4, state_h_4, state_c_4 = decoder_lstm_4(outputs_3, initial_state=states_4)

        # Store the current prediction (we will concatenate all predictions later)
        outputs = decoder_dense(outputs_4)
        all_outputs.append(outputs)
        # Reinject the outputs as inputs for the next loop iteration
        # as well as update the states
        inputs = outputs
        states_1 = [state_h_1, state_c_1]
        states_2 = [state_h_2, state_c_2]
        states_3 = [state_h_3, state_c_3]
        states_4 = [state_h_4, state_c_4]


    # Concatenate all predictions
    decoder_outputs = Lambda(lambda x: K.concatenate(x, axis=1))(all_outputs)   

    model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

    #model = load_model('pre_model.h5')


    print(model.summary()
    return (model)
Lafayette
  • 568
  • 4
  • 19
  • 1
    Thanks. A lot to think about with this one! – Garry Oct 11 '19 at 07:05
  • 1
    I have a question regarding the model you present and the use of this approach for seq2seq in which the decoder is not of a fixed length. Could you please have a look? https://datascience.stackexchange.com/questions/61938/how-to-create-a-seq2seq-without-specifying-a-fixed-decoder-length Or https://stackoverflow.com/questions/58363406/how-to-create-a-seq2seq-without-specifying-a-fixed-decoder-length – user2182857 Oct 23 '19 at 20:46
2

The best way, in my opinion, to implement a seq2seq LSTM in Keras, is by using 2 LSTM models and having the first one transfer its states to the second one.

Your last LSTM layer in the encoder will need

return_state=True ,return_sequences=False so it will pass on its h and c.

You will then need to set an LSTM decoder that will receive these as it's initial_state.

For decoder input you will most likely want a "start of sequence" token as the first time step input, and afterwards use the decoder output of the nth time step as the input of the the decoder in the (n+1)th time step.

After you have mastered this, have a look at Teacher Forcing.

user2182857
  • 718
  • 8
  • 20
  • Thanks. Can you point me to an example? I can't understand why I need to delve into the timesteps when decoding. – Garry Oct 09 '19 at 16:42
  • You're welcome. You might find this example informative: https://keras.io/examples/lstm_seq2seq/ Regarding decoding and timesteps - the idea of seq2seq is not to predict each output timestep based on the input timesteps directly before it, but rather to "compress" the entire input sequence and export it to the decoder. The decoder's `nth` is not based specifically on the `0:(n-1)th` inputs; rather, it is based on the entire input sequence (via the encoder-decoder interface of `h` and `c` export) and on the `(n-1)th` preceding decoder outputs. – user2182857 Oct 09 '19 at 20:04