Why are models such as BERT or GPT-3 considered unsupervised learning during pre-training when there is an output (label)

Asked Feb 17 '22 at 09:04

Active Feb 17 '22 at 09:04

Viewed 159 times

I am not very experienced with unsupervised learning, but my general understanding is that in unsupervised learning, the model learns without there being an output. However, during pre-training in models such as BERT or GPT-3, it seems to me that there is an output. For example, in BERT, some of the tokens in the input sequence are masked. Then, the model will try to predict those words. Since we already know what those masked words originally were, we can compare that with the prediction to find the loss. Isn't this basically supervised learning?

asked Feb 17 '22 at 09:04

danielkim9

Why are models such as BERT or GPT-3 considered unsupervised learning during pre-training when there is an output (label)

0 Answers0