2

According to the HuggingFace Transformer's website (https://huggingface.co/transformers/model_doc/gpt2.html#gpt2doubleheadsmodel), GPT2DoubleHeadsModel (NOT GPT2LMHeadModel but GPT2DoubleHeadsModel) is the GPT-2 transformer model with a language modelling and a multiple-choice classification head on top e.g. for RocStories/SWAG tasks.

Does this mean that we can use the GPT2DoubleHeadsModel to process both non-multiple-choice-based language modelling tasks (i.e. next word prediction) as well as the multiple-choice questions, without making any adjustment to its head? Or would I need to adjust the head of the GPT2DoubleHeadsModel if I want to do the non-multiple-choice-based next word predictions because the GPT2DoubleHeadsModel is for answering multiple-choice type questions only?

I am a bit confused by this because the impression that I got from reading your GPT-2 paper is that GPT-2 uses language modelling process to process every type of language task (therefore GPT-2 would only have the regular language modelling head at the top), yet the name "GPT2DoubleHeadsModel" seem to suggest that I need to adjust the head of this GPT-2 for different types of language tasks.

Thank you,

Guy Coder
  • 24,501
  • 8
  • 71
  • 136
chico0913
  • 577
  • 4
  • 10
  • 22

0 Answers0