Why do unmasked tokens of a sequence change when passed through a language model?

Asked Aug 23 '23 at 19:13

Active Aug 24 '23 at 05:24

Viewed 11 times

Why passing a sequence of tokens, say ["A", "B", "C", "D"] through a masked language model without any masking does not result in the same sequence being output when you select the highest probability tokens from the output model logits, i.e., tokenizer.decode(softmax(logits).argmax()) ≠ ["A", "B", "C", "D"]?

Is this just a byproduct of the fact that the model may not have converged yet? I understand that when a model has just been initialized the embeddings are random and have nothing to do with the semantics of the vocab but after a few epochs I would expect the model to be able to retrieve the unmasked input sequence perfectly.

edited Aug 24 '23 at 05:24

asked Aug 23 '23 at 19:13

Anshul

Why do unmasked tokens of a sequence change when passed through a language model?

0 Answers0