How to get immediate next word probability using GPT2 model?

Question

I was trying the hugging face gpt2 model. I have seen the run_generation.py script, which generates a sequence of tokens given a prompt. I am aware that we can use GPT2 for NLG.

In my use case, I wish to determine the probability distribution for (only) the immediate next word following the given prompt. Ideally this distribution would be over the entire vocab.

For example, given the prompt: "How are ", it should give a probability distribution where "you" or "they" have the some high floating point values and other vocab words have very low floating values.

How to do this using hugging face transformers? If it is not possible in hugging face, is there any other transformer model that does this?

score 14 · Accepted Answer · answered Jul 15 '20 at 09:11

You can have a look at how the generation script works with the probabilities.

GPT2LMHeadModel (as well as other "MLHead"-models) returns a tensor that contains for each input the unnormalized probability of what the next token might be. I.e., the last output of the model is the normalized probability of the next token (assuming input_ids is a tensor with token indices from the tokenizer):

outputs = model(input_ids)
next_token_logits = outputs[0][:, -1, :]

You get the distribution by normalizing the logits using softmax. The indices in the first dimension of the next_token_logits correspond to indices in the vocabulary that you get from the tokenizer object.

Selecting the last logits becomes tricky when you use a batch size bigger than 1 and sequences of different lengths. In that case, you would need to specify attention_mask in the model call to mask out padding tokens and then select the last logits using torch.index_select. It is much easier either to use batch size 1 or batch of equally long sequences.

You can use any autoregressive model in Transformers: there is distilGPT-2 (a distilled version of GPT-2), CTRL (which is basically GPT-2 trained with some additional "commands"), the original GPT (under the name openai-gpt), XLNet (designed for contextual embeddings, but can be used for generation in arbitrary order). There are probably more, you can Hugging Face Model Hub.

Is there any way to do the same thing but instead of giving the beginning of the sentence, I give gt2 a complete sentence and ask it to give the translated/paraphrased sentence (seeing the word's probability) ? — Mucida, May 05 '22 at 12:03

How to get immediate next word probability using GPT2 model?

1 Answers1