How to train GPT2 with Tensorflow

Question

I'm trying to train gpt2 model with custom dataset, but it fails with the error below.

ValueError: Unexpected result of `train_function` (Empty logs). Please use `Model.compile(..., run_eagerly=True)`, or `tf.config.run_functions_eagerly(True)` for more information of where went wrong, or file a issue/bug to `tf.keras`.

I thought model and dataset are correctly defined and processed by referring this article.
But the error shows up when model.fit is executed.

Can someone tell me how to resolve the error, or proper way to train the model?

from transformers import TFGPT2LMHeadModel, GPT2Tokenizer
import tensorflow as tf

# Define the model
model = TFGPT2LMHeadModel.from_pretrained('gpt2', from_pt=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=[loss, *[None] * model.config.n_layer], metrics=[metric], run_eagerly=True)
model.summary()

# Obtain the tokeinizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
tokenizer.add_special_tokens({
  "eos_token": "</s>",
  "bos_token": "<s>",
  "unk_token": "<unk>",
  "pad_token": "<pad>",
  "mask_token": "<mask>"
})

# Get single string
paths = ['data.txt']  # each file only contains some sentences.

single_string = ''
for filename in paths:
    with open(filename, "r", encoding='utf-8') as f:
        x = f.read()
    single_string += x + tokenizer.eos_token

string_tokenized = tokenizer.encode(single_string)

print(string_tokenized)

# creating the TensorFlow dataset
examples = []
block_size = 100
BATCH_SIZE = 12
BUFFER_SIZE = 1000
for i in range(0, len(string_tokenized) - block_size + 1, block_size):
    examples.append(string_tokenized[i:i + block_size])
inputs, labels = [], []
for ex in examples:
    inputs.append(ex[:-1])
    labels.append(ex[1:])
dataset = tf.data.Dataset.from_tensor_slices((inputs, labels))
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

print(dataset)

# train the model
num_epoch = 10
history = model.fit(dataset, epochs=num_epoch) # <- shows the error

Were you able to find the fix? I am facing the same error – steve landiss May 10 '23 at 19:53 — steve landiss, May 10 '23 at 19:53

How to train GPT2 with Tensorflow

0 Answers0