Questions tagged [gpt-2]

Use this tag with Generative Pre-trained Transformer 2 (GPT-2). Do not use with GPT-3 or the ad tagging library (GPT).

References

See the GPT-2 definition on Wikipedia.

Related Tags

199 questions
0
votes
0 answers

Why do generating text with gpt2 keep increasing memory consumption?

I have a python script running an infinite loop, calling gpt2.generate, running on CPU (not GPU). After the model is loaded and the first spike of memory usage is over, the RAM consumption keep increasing by about 100Mo every minute. There is…
Bite code
  • 578,959
  • 113
  • 301
  • 329
0
votes
1 answer

GPT2 paper clarification

In the GPT-2 paper, under Section 2, Page 3 it says, Since the supervised objective is the the same as the unsupervised objective but only evaluated on a subset of the sequence, the global minimum of the unsupervised objective is also the global…
Albin
  • 36
  • 3
0
votes
1 answer

OOM while fine-tuning medium sized model with DialoGPT on colab

I am trying to finetune DialoGPT with a medium-sized model, I am getting Cuda error while the training phase, I reduced the batch size from 4, but still, the error persists. My parameters are #self.output_dir = 'output-small' …
0
votes
0 answers

How can I respond to a CLI prompt in Kaggle?

I'm using Kaggle to generate poetry samples with GPT-2. My notebook uses datasets from Gwern's poetry generator and uses nshepperd's GPT-2 model. This all works fine with my notebook when generating unconditional samples. !python…
theo
  • 65
  • 2
  • 10
0
votes
0 answers

_forward_unimplemented() got an unexpected keyword argument 'input_ids'

I am training a model using HuggingFace Trainer class.(GPT2 text Classification) The following code does a decent job: def preprocess_function(examples): return tokenizer(examples["text"], truncation=True ,max_length=MAXLEN, …
0
votes
0 answers

Generating 10000 sentences from GptNeo Model results in out of memory error

I was doing some work where I wanted to generate 10000 sentences from the GptNeo Model. I have a GPU of size 40GB and am running the model in the GPU but everytime the code runs out of memory. Is there a limitation to the number of sentences that I…
prb977
  • 43
  • 5
0
votes
1 answer

How to save checkpoints for thie transformer gpt2 to continue training?

I am retraining the GPT2 language model, and am following this blog : https://towardsdatascience.com/train-gpt-2-in-your-own-language-fc6ad4d60171 Here, they have trained a network on GPT2, and I am trying to recreate a same. However, my dataset is…
Vivek
  • 124
  • 14
0
votes
1 answer

GPT-2 pretrained model fails to load when TF v2 behaviour is disabled

I am trying to use GPT-2 in a codebase that is written for Tensorflow 1.x. However, I am running the code against TF 2.x installation binaries with tf.disable_v2_behavior() flag. Without this tf.disable_v2_behavior() flag, GPT-2 pretrained model…
0
votes
2 answers

"ValueError: You have to specify either input_ids or inputs_embeds" when training AutoModelWithLMHead Model (GPT-2)

I want to fine-tune the AutoModelWithLMHead model from this repository, which is a German GPT-2 model. I have followed the tutorials for pre-processing and fine-tuning. I have prepocessed a bunch of text passages for the fine-tuning, but when…
Stimmot
  • 999
  • 1
  • 7
  • 22
0
votes
0 answers

Fine-tune dialoGPT with a new dataset - loss below 1 and perplexity exploded

I am following the tutorial https://github.com/ncoop57/i-am-a-nerd/blob/master/_notebooks/2020-05-12-chatbot-part-1.ipynb on fine-tuning DialoGPT (GPT-2) with a new conversational dataset. It was trained fine earlier, the perplexity was about 5, 6…
vicmerbia
  • 1
  • 2
0
votes
2 answers

What happens if optimal training loss is too high

I am training a Transformer. In many of my setups I obtain validation and training loss that look like this: Then, I understand that I should stop training at around epoch 1. But then the training loss is very high. Is this a problem? Does the…
0
votes
1 answer

Trouble getting text from GPT2 returned?

basically I am trying to have gpt2 respond to a prompt in the variable {text} and I am running into this error: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() here is my code thus far: import…
user13800925
0
votes
1 answer

implement do_sampling for custom GPT-NEO model

import numpy as np from transformers import GPTNeoForCausalLM, GPT2Tokenizer import coremltools as ct tokenizer = GPT2Tokenizer.from_pretrained("gpt2") sentence_fragment = "The Oceans are" class NEO(torch.nn.Module): def __init__(self,…
0
votes
0 answers

Cudnn won't work when I install cudnn64_8.dll

So I'm currently working with GPT2 running on Tensorflow for text generation. I'm working with this repo specifically. I recently decided to install CUDA and cudnn to improve GPU capability and installed it via these instructions. I'm currently…
Alditrus
  • 87
  • 1
  • 5
0
votes
1 answer

GPT 2 - TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

I am working with gpt2, python 3.9 and tensorflow 2.5 and when connecting to flask (flask run in terminal) I get a following message: TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe' Here is the code…
DK26
  • 103
  • 8