Questions tagged [gpt-2]

Use this tag with Generative Pre-trained Transformer 2 (GPT-2). Do not use with GPT-3 or the ad tagging library (GPT).

References

See the GPT-2 definition on Wikipedia.

Related Tags

199 questions
3
votes
3 answers

How does GPT-like transformers utilize only the decoder to do sequence generation?

I want to code a GPT-like transformer for a specific text generation task. GPT-like models use only the decoder block (in stacks) [1]. I know how to code all sub-modules of the decoder block shown below (from the embedding to the softmax layer) in…
mac179
  • 1,540
  • 1
  • 14
  • 24
3
votes
0 answers

"RuntimeError: Expected target size" error for the nn.CrossEntropyLoss() function

I am trying to train a GPT-2 model to take in a tokenized/padded input and predict the output. My batch size is 32. My max length is 343. I believe that the 768 comes from the model. I cannot get the loss function to work properly though. The…
C_Dog
  • 31
  • 2
3
votes
1 answer

Text generation AI models generating repeated/duplicate text/sentences. What am I doing incorrectly? Hugging face models - Meta GALACTICA

Whole day I have worked with available text generation models Here you can find list of them : https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads I want to generate longer text outputs, however, with multiple different models,…
Furkan Gözükara
  • 22,964
  • 77
  • 205
  • 342
3
votes
1 answer

Getting logits from T5 Hugging Face model using forward() method without labels

For my use case, I need to obtain the logits from T5's forward() method without inputting labels. I know that forward() and .generate() are different (see here). I have also seen this post in which the logits were obtained but labels had to be…
3
votes
1 answer

Should I adjust the weights of embedding of newly added tokens?

I'm a beginner of neural language processing. Recenttly, I try to train a text generation model based on GPT-2 with huggingface transformers. I added some new tokens to the tokenizer and resize the embedding of the model with…
3
votes
1 answer

How to early-stop autoregressive model with a list of stop words?

I am using GPT-Neo model from transformers to generate text. Because the prompt I use starts with '{', so I would like to stop the sentence once the paring '}' is generated. I found that there is a StoppingCriteria method in the source code but…
3
votes
0 answers

Incrementally training || pause&resume training, GPT2 language model'ing

I'm currently trying to learn python - and at the same time learning machine learning with GPT-2 language modeling - i have had some problems, and i got over most of them, and finally got something decent running. But... as most of you probably…
3
votes
2 answers

Huggingface Transformer - GPT2 resume training from saved checkpoint

Resuming the GPT2 finetuning, implemented from run_clm.py Does GPT2 huggingface has a parameter to resume the training from the saved checkpoint, instead training again from the beginning? Suppose the python notebook crashes while training, the…
3
votes
1 answer

What memory does Transformer Decoder Only use?

I've been reading a lot about transformers and self attention and have seen both BERT and GPT-2 are a newer version that only use an encoder transformer (BERT) and decoder transformer (GPT-2). I've been trying to build a decoder only model for…
bellerb
  • 137
  • 8
3
votes
1 answer

How many characters can be input into the "prompt" for GPT-2

I'm using the OpenAI GPT-2 model from github I think that the top_k parameter dictates how many tokens are sampled. Is this also the parameter that dictates how large of a prompt can be given? If top_k = 40, how large can the prompt be?
Hanley Soilsmith
  • 579
  • 2
  • 9
  • 27
3
votes
2 answers

Tensorflow has no Attribute "sort" in GPT 2 Git Release?

I downloaded the git repo (https://github.com/openai/gpt-2) and followed the python3 instructions (in DEVELOPERS.MD) for installation on my Kubuntu 18.04LTS box, but I cannot run it and instead get an error. Here is what I've done so far: pip3…
Sarah Szabo
  • 10,345
  • 9
  • 37
  • 60
3
votes
1 answer

Is there a GPT-2 implementation that allows me to fine-tune and prompt for text completion?

I wish to to fine-tune a GPT-2 implementation on some text data. I then want to use this model to complete a text prompt. I can do the first part easily enough using Max Woolf's gpt-2-simple implementation. And Neil Shepherd's fork of OpenAI allows…
Lodore66
  • 1,125
  • 4
  • 16
  • 34
3
votes
5 answers

Can't import the encoder code for fine tuning GPT-2

I'm trying to reproduce the example from this article: https://medium.com/@ngwaifoong92/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f The example code is from the following repo:…
Luis Ramon Ramirez Rodriguez
  • 9,591
  • 27
  • 102
  • 181
3
votes
0 answers

Using GPT-2 with your own dictionary of words

I'm training the gpt-2 with custom encodings and custom vocab.bpe file. However, when I generate text using gpt-2, the output tokens have range that exceeds the range of my new encodings. How can I make gpt-2 work for me then?
zengod
  • 1,114
  • 13
  • 26
3
votes
2 answers

Fine tune GPT-2 Text Prediction for Conversational AI

I am experimenting with the gpt-2 model's conditional text generation to tweak it for a good chatbot. I am using nsheppard's code for retraining it on my custom dataset. I trained my model on a custom dataset of conversations that I pulled from my…
Bhavesh Laddagiri
  • 365
  • 2
  • 5
  • 12
1
2
3
13 14