Typically, when discussing stacking LSTMs (with independent weights), the cell and hidden states are unique to each individual cell and not shared between them. Each LSTM cell operates independently with its own set of states.
Is there any reason for using the output cell state and hidden state of one LSTM cell as the input cell state and hidden state for another LSTM cell? Does this have any logic?
I had in mind a model that only receives one vector/single timestep as input (not a sequence), but I wanted to keep memory between consecutive iterations of the model (using stateful=True
in tf.keras.layers.LSTM).