Seq to Seq model training

Question

I have couple of questions:

In a seq to seq model with varying input length, if you don't use the attention mask the RNN may end up computing the hidden state value for padded element? So thus it mean attention mask is mandatory else my output will be wrong?
How to deal with varying length labels then, let's say I have padded for passing it in batch. Now I don't want my padded elements to have an impact on my loss, so how do I ignore that?

score 0 · Answer 1 · answered Oct 20 '19 at 07:05

0

No, not necessarily. RNN takes time series and computes the Hidden state every time. you can force your RNN to stop and not to compute the hidden state value for the padded elements.

You can use Dynamic RNN for that. read about it here: What is a dynamic RNN in TensorFlow?

answered Oct 20 '19 at 07:05

Peyman

Okay I didn't know about dynamic RNN thanks for that. But what if we are using a normal RNN? Attention mask is mandatory? Beside how to deal with labels when it is padded while calculating loss. – anandh perumal Oct 20 '19 at 19:15
@anandhperumal 1) "Dynamic RNN" is just a name, it is actually normal RNN. just doesn't calculate the padded ones (This simple: `if padded: don't go further`). 2) Yes, it is mandatory and it will work, but it would be better with attention. 3) loss function is in your hand. just mask the "padded" parts in loss calculation. I mean write loss function in a way that padded ones don't influence it. – Peyman Oct 20 '19 at 19:51
"Yes, it is mandatory and it will work, but it would be better with attention." You mean it's not mandatory right? – anandh perumal Oct 21 '19 at 00:25

1 Answers1