Why do we need padding in seq2seq network

Question

To handle sequences of different, I would like to know.

Why do we need padding the sequence the word to the same length?
If the answer is "Yes, you need padding.". Can I set the padding in other indexes? For example, if I have an index word like this:

{0:"<s>,1:"<e>",2:"AAA",3:"BBB",.......,500:"zzz"}

Where <s> is starting word of the sentence and is the ending word of the sentence.

Can I set the padding flag to the last index?

{0:"<s>,1:"<e>",2:"AAA",3:"BBB",.......,500:"zzz",501:"<pad>"}

sebrockm · Accepted Answer · 2019-08-07T14:28:24.903

Why do we need padding the sequence the word to the same length?

Because basically all layers with parameters perform some way of matrix multiplication (actually: tensor multiplication) at some point in their logic. Now, try it yourself. Multiply matrices where not all rows or columns have the same length. E.g. what is this supposed to be?

| 1 2 3 |     | 1 | 
| 4 5   |  *  | 2 |  =  ???
              | 3 |

It is simply not possible to do this, unless you put some value in the gap. Some people may even argue that this thing on the left hand side is not even a matrix.

Can I set the padding in other indexes? Can I set the padding flag to the last index?

Sure. You can take whatever value you want for padding. Ideally, you should use a value that has otherwise no other meaning in the context of you problem and thus cannot be confused with any "real" value.

Thank you for your information. I have another question. We should padding after an embedded or before an embedded layer? — Pisit Nakjai, Aug 09 '19 at 09:09
@PisitNakjai In theory it doesn't matter. In practice, before is easier, I'd say. Then your model should learn a good vector for the padding, just as it learns good vectors for the real words. — sebrockm, Aug 09 '19 at 09:24

Why do we need padding in seq2seq network

1 Answers1