According to the HuggingFace Transformer's website (https://huggingface.co/transformers/model_doc/gpt2.html#gpt2doubleheadsmodel), GPT2DoubleHeadsModel (NOT GPT2LMHeadModel but GPT2DoubleHeadsModel) is the GPT-2 transformer model with a language modelling and a multiple-choice classification head on top e.g. for RocStories/SWAG tasks.
Does this mean that we can use the GPT2DoubleHeadsModel to process both non-multiple-choice-based language modelling tasks (i.e. next word prediction) as well as the multiple-choice questions, without making any adjustment to its head? Or would I need to adjust the head of the GPT2DoubleHeadsModel if I want to do the non-multiple-choice-based next word predictions because the GPT2DoubleHeadsModel is for answering multiple-choice type questions only?
I am a bit confused by this because the impression that I got from reading your GPT-2 paper is that GPT-2 uses language modelling process to process every type of language task (therefore GPT-2 would only have the regular language modelling head at the top), yet the name "GPT2DoubleHeadsModel" seem to suggest that I need to adjust the head of this GPT-2 for different types of language tasks.
Thank you,