When stacking RNNs, it is mandatory to set return_sequences
parameter as True
in Keras.
For instance in Keras,
lstm1 = LSTM(1, return_sequences=True)(inputs1)
lstm2 = LSTM(1)(lstm1)
It is somewhat intuitive to preserve the dimensionality of input space for each stacked RNN layer, however, I am not convinced thoroughly.
Can someone (mathematically) explain the reason?
Thanks.