0

In sequence-to-sequence learning when we are predicting more than one step ahead, should we optimize the neural network after each output or should we optimize the outputs of every sequence together?

Like if I am predicting 10 steps for each sequence, should I optimize for each of these 10 or optimize all of them together?

For the clarification: in the following picture, "I" is the prediction that goes to the next level. But while training shouldn't we feed the next time step with ground truth and not the prediction?

enter image description here

1 Answers1

2

No. The point of sequence to sequence is that you evaluate at the end of the sequence. The sequence is considered inseparable.

Therefore, if you are predicting a sequence of 10, you only evaluate (e.g. calculate loss) for all the ten steps together.

Let's say your sequence is of length 10.

Then your inputs and predictions are:

input sample 0-9  -> predict 10-19 -> calculate loss
input sample 10-19 (ground truth) -> predict 20-29 -> calculate loss

If your data allows it, you can implement a rolling window.

input sample 0-9 -> predict 10-19 -> calculate loss,
input sample 1-10 -> predict 11-20 -> calculate loss,
input sample 2-11 -> predict 12-21 -> calculate loss,

The problem is if your sequence is of length 10, but for some reason you need 30 predictions (3 sequences) from only one datapoint (one sequence of 10).

Then your only option is

input 0-9 -> predict 10-19 -> input this prediction again -> predict 20-29 -> input the prediction again -> predict 30-39.

But this last case is only when you have only one datapoint (one sequence of 10) and need a long prediction.

Also be aware that doing this will lead to quite large errors, because the errors will keep accumulating over time.

Zoe
  • 1,402
  • 1
  • 12
  • 21
  • Thanks s lot for your answer. I appreciate it. I have another question, in the training phase of sequence-to-sequence learning, do we feed the ground truth of previous at each step? or we feed the predicted value of previous step for each step? – user4172070 Nov 21 '17 at 00:22
  • Thanks so much for your help. But, What I meant is that for example in the picture that I just added in the edited version of my question, "I" is the prediction that goes to the next level. But while training shouldn't we feed the next time step with ground truth and not the prediction? I'm sorry my question was vague – user4172070 Nov 21 '17 at 15:36