I am implementing a LSTM
autoencoder in Keras to get a vector representation of my time series data.
The series I have are very long and so I am using stateful LSTMs. I create non-overlapping windows of each series and input them to the autoencoder.
See code below.
I am unclear of how to get the vector representation of a time series:
What is the vector representation of the series? Is it the encoder hidden state or the encoder output?
Each sequence is broken into windows and when performing predict, I get an
[encoder_outputs, state_h, state_c]
per window. Which window contains the vector representation of the entire sequence? Is it the last window? The first?
# Builing the Model.
inputs = Input(shape=(batch_size,window_size, input_dim))
encoded = LSTM(latent_dim, stateful=True, batch_input_shape=
(batch_size,window_size, input_dim))(inputs)
decoded = RepeatVector(window_size)(encoded)
decoded = LSTM(input_dim, return_sequences=True, stateful=True,
batch_input_shape=(batch_size,window_size, input_dim))(decoded)
decoded = TimeDistributed(Dense(latent_dim, activation='linear')(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
# Predicting using the encoder
encoded_out=encoder.predict(X, batch_size=batch_size)
# For each sequence in X, we take the output of the last window as the
vector representing the entire sequence.
# Is this correct?
seqVector=encoded_out[-batch_size:]