ValueError when trying to fine-tune GPT-2 model in TensorFlow

Question

I am encountering a ValueError in my Python code when trying to fine-tune Hugging Face's distribution of the GPT-2 model. Specifically:

ValueError: Dimensions must be equal, but are 64 and 0 for
'{{node Equal_1}} = Equal[T=DT_FLOAT, incompatible_shape_error=true](Cast_18, Cast_19)'
with input shapes: [64,0,1024], [2,0,12,1024].

I have around 100 text files that I concatenate into a string variable called raw_text and then pass into the following function to create training and testing TensorFlow datasets:

def to_datasets(raw_text):
    # split the raw text in smaller sequences
    seqs = [
        raw_text[SEQ_LEN * i:SEQ_LEN * (i + 1)]
        for i in range(len(raw_text) // SEQ_LEN)
    ]

    # set up Hugging Face GPT-2 tokenizer
    tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
    tokenizer.pad_token = tokenizer.eos_token

    # tokenize the character sequences
    tokenized_seqs = [
        tokenizer(seq, padding="max_length", return_tensors="tf")["input_ids"]
        for seq in seqs
    ]

    # convert tokenized sequences into TensorFlow datasets
    trn_seqs = tf.data.Dataset \
        .from_tensor_slices(tokenized_seqs[:int(len(tokenized_seqs) * TRAIN_PERCENT)])
    tst_seqs = tf.data.Dataset \
        .from_tensor_slices(tokenized_seqs[int(len(tokenized_seqs) * TRAIN_PERCENT):])

    def input_and_target(x):
        return x[:-1], x[1:]

    # map into (input, target) tuples, shuffle order of elements, and batch
    trn_dataset = trn_seqs.map(input_and_target) \
        .shuffle(SHUFFLE_BUFFER_SIZE) \
        .batch(BATCH_SIZE, drop_remainder=True)
    tst_dataset = tst_seqs.map(input_and_target) \
        .shuffle(SHUFFLE_BUFFER_SIZE) \
        .batch(BATCH_SIZE, drop_remainder=True)

    return trn_dataset, tst_dataset

I then try to train my model, calling train_model(*to_datasets(raw_text)):

def train_model(trn_dataset, tst_dataset):
    # import Hugging Face GPT-2 model
    model = TFGPT2Model.from_pretrained("gpt2")

    model.compile(
        optimizer=tf.keras.optimizers.Adam(),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=tf.metrics.SparseCategoricalAccuracy()
    )

    model.fit(
        trn_dataset,
        epochs=EPOCHS,
        initial_epoch=0,
        validation_data=tst_dataset
    )

The ValueError is triggered on the model.fit() call. The variables in all-caps are settings pulled in from a JSON file. Currently, they are set to:

{
    "BATCH_SIZE":64,
    "SHUFFLE_BUFFER_SIZE":10000,
    "EPOCHS":500,
    "SEQ_LEN":2048,
    "TRAIN_PERCENT":0.9
}

Any information regarding what this error means or ideas on how to resolve it would be greatly appreciated. Thank you!

N_R · Answer 1 · 2021-07-29T19:55:56.590

-1

I'm having the same problem but when I change the batch size to 12 (same as n_layer parameter in the gpt-2 config file) it works. I don't Know why it works but you can try it... If you manage to solve it on different way I will be glad to hear.

edited Jul 29 '21 at 19:55

answered Jul 29 '21 at 15:43

N_R

1
1

I think you have an out of memory error... not related to the question. – RogerS Nov 11 '22 at 08:35

ValueError when trying to fine-tune GPT-2 model in TensorFlow

1 Answers1