1

I'm trying to use a GPT language model and get the weights it assigns to each word in the last state of text generation. My model is a GPT2 from the transformers library. Below is how I call the pretrained model:

tokenizer = AutoTokenizer.from_pretrained(
"HooshvareLab/gpt2-fa-poetry"
) 

model = AutoModelForCausalLM.from_pretrained(
    "HooshvareLab/gpt2-fa-poetry"
)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = model.to(device)

My goal is to use this information from the last layer of this model (a matrix with the length of vocabulary after the softmax activation) and use it in combination with another model.

I'm trying to do this in TensorFlowPlease, but share your comments if you think there are easier and more convenient ways of doing this in PyTorch.

mitra mirshafiee
  • 393
  • 6
  • 17
  • 2
    The model already returns what you are looking for (output parameter [logits](https://huggingface.co/transformers/model_doc/gpt2.html#transformers.GPT2LMHeadModel.forward)). – cronoik Apr 07 '21 at 16:57
  • Thank you very much @cronoik, I was able to do this perfectly. Right now I'm wondering if I can generating these outputs in batches to make the process faster. Do you think this is possible without having a for loop? – mitra mirshafiee Apr 08 '21 at 04:34
  • 1
    Yes that is possible. The output of logits is a tensor of size (batch_size, sequence_length, config.vocab_size). – cronoik Apr 08 '21 at 13:37

0 Answers0