Questions tagged [gpt-2]

Use this tag with Generative Pre-trained Transformer 2 (GPT-2). Do not use with GPT-3 or the ad tagging library (GPT).

References

See the GPT-2 definition on Wikipedia.

Related Tags

199 questions
2
votes
0 answers

Training huggingface's GPT2 from scratch : how to implement causal mask?

I am trying to train huggingface's implementation of the GPT2 model from scratch (meaning I am using their architecture but not using pre-trained weights) but I noticed by looking into the code here…
2
votes
2 answers

GPT-2 Continue training from checkpoint

I am trying to continue training from a saved checkpoint using the colab setup for GPT-2-simple at: https://colab.research.google.com/drive/1SvQne5O_7hSdmPvUXl5UzPeG5A6csvRA#scrollTo=aeXshJM-Cuaf But I just cant get it to work. Loading the saved…
Tessmus
  • 149
  • 2
  • 9
2
votes
1 answer

Tensorflow not fully utilizing GPU in GPT-2 program

I am running the GPT-2 code of the large model(774M). It is used for the generation of text samples through interactive_conditional_samples.py , link: here So I've given an input file containing prompts which are automatically selected to generate…
amateur
  • 21
  • 2
2
votes
0 answers

Can the HuggingFace GPT2DoubleHeadsModel be used for non-multiple-choice next token prediction?

According to the HuggingFace Transformer's website (https://huggingface.co/transformers/model_doc/gpt2.html#gpt2doubleheadsmodel), GPT2DoubleHeadsModel (NOT GPT2LMHeadModel but GPT2DoubleHeadsModel) is the GPT-2 transformer model with a language…
chico0913
  • 577
  • 4
  • 10
  • 22
1
vote
0 answers

Invalid key: 409862 is out of bounds for size 0

How I can fix this: I writed code for training GPT-2 on dataset by Hugging Face, but I have an error and don't know why I got this error: --------------------------------------------------------------------------- IndexError …
1
vote
0 answers

GPT2 LLM fine-tuned model not generating expected answer

I am finetuning gpt2 model to answer questions with given faq.json. There is some issue with the answer generated by below code. I am assuming I have not done encoding/decoding of questions and answers correctly. Code - import torch from…
tagg
  • 383
  • 4
  • 7
1
vote
0 answers

Expected input batch_size (28) to match target batch_size (456), Changing batch size increase the target batch size with GPT2 model

I was practising fine-tuning a gpt2 model on a simple question-answer dataset when I encountered this error. I have studied other answers, but my input dataset shapes look fine. def tokenize_data(total_marks, coding_feeddback): inputs =…
Irfan Yaqub
  • 402
  • 4
  • 14
1
vote
2 answers

Disable layers in GPT-2 model

I'm currently using a GPT-2 model that was trained on German texts. I would like to generate the next word in a text given a context chunk, but instead of using the whole model to predict the next word, I want each of the 12 layers to predict the…
Merle
  • 125
  • 1
  • 14
1
vote
0 answers

In which form should be dataset in NLP model?

I try to make fine-tuning of model tinkoff-ai/ruDialoGPT-medium. In which form should be my dataset? The base generation is in form: @@ПЕРВЫЙ@@ привет @@ВТОРОЙ@@ привет @@ПЕРВЫЙ@@ как дела? @@ВТОРОЙ@@ Where @@ПЕРВЫЙ@@ is the first person,…
1
vote
0 answers

Questions about masks of padding in GPT

The GPT series models use the decoder of Transformer, with unidirectional attention. In the source code of GPT in Hugging Face, there is the implementation of masked attention: self.register_buffer( "bias", …
1
vote
1 answer

Recovering input IDs from input embeddings using GPT-2

Suppose I have the following text aim = 'Hello world! you are a wonderful place to be in.' I want to use GPT2 to produce the input_ids and then produce the embedding and from embeddings recover the input_ids, to do this I do: from transformers…
Wiliam
  • 1,078
  • 10
  • 21
1
vote
0 answers

Template for RLHF with the TRL library

I'm trying to implement a very very basic working template for RLHF with TRL. The notebook is here: https://www.kaggle.com/code/mcantoni81/rlhf-with-trl-gpt2 My target here is to make gpt2 answer "i'm the mailman", but maybe i'm not getting right…
1
vote
1 answer

When using OPT-2.7B or any other natural language model, is there a way to trick it into having a conversation/ give it a pre prompt in the code

Using this code, or a variant of, is there anything that can be added to "trick" opt into conversing as another user in a style more similar to a chatbot. As of now it will either start something more similar to an article or have a conversation…
1
vote
2 answers

how to fine tune a GPT-2 model?

i'm using huggingface transformers package to load a pretrained GPT-2 model. I want to use GPT-2 for text generation, but the pretrained version isn't enough so I want to fine tune it with a bunch of personal text data. i'm not sure how I should…
ParmuTownley
  • 957
  • 2
  • 14
  • 34
1
vote
0 answers

NAN values appears when including a new padding token in my tokenizer

I'm trying to fine-tune a DialoGPT model on a new dataset. I already processed my data correctly and adding a new padding token in the tokenizer didn't seem to make any issue : #my dataset :…