1

Let's say that I have 3 sequences of the same length, and for each sequence I apply TBPTT using a recurrent network with lstm cells. I want during the training the last lstm cell for each sub-sequence to keep the hidden state and cell state so I train my model with stateful lstm. Updating the parameters I apply stochastic gradient descent after each sub sequence using as mini_batch the length of the subsequences. I have a question.

  • I know that during stochastic gradient descent I should shuffle for each epoch my data, but in this case, being the data chained what can I do? Should I shuffle the 3 sequences and not the step of each sequences or I should not really shuffle at all?

  • in the stateless case should I shuffle the subsequences and no really each step?

For both the cases, during the test phase for stateful case I should not use truncated sequences and for stateless case should I use truncated sequences?

I'd like to know more for the stateful case then for stateless.

Solved: the keras implementation of stateful lstm doesn't shuffle the data, anyway i think it's more useful shuffling the long sequences after each epoch

erre4
  • 13
  • 7

0 Answers0