Why return sequences in stacked RNNs?

Question

When stacking RNNs, it is mandatory to set return_sequences parameter as True in Keras.

For instance in Keras,

lstm1 = LSTM(1, return_sequences=True)(inputs1)
lstm2 = LSTM(1)(lstm1)

It is somewhat intuitive to preserve the dimensionality of input space for each stacked RNN layer, however, I am not convinced thoroughly.

Can someone (mathematically) explain the reason?

Thanks.

score 3 · Answer 1 · answered Dec 06 '17 at 12:06

The input shape for recurrent layers is:

(number_of_sequences, time_steps, input_features).

This is absolutely required for recurrent layers because there can only be any recurrency if there are time steps.

Now, compare the "outputs" of the recurrent layers in each case:

with return_sequences=True - (number_of_sequences, time_steps, output_features)
with return_sequences=False - (number_of_sequences, output_features)

Without return_sequences=True, you eliminate the time steps, so, it cannot be fed into a recurrent layer, because there aren't enough dimensions and the most important one, the time_steps is not present.

Why return sequences in stacked RNNs?

1 Answers1