NAN values appears when including a new padding token in my tokenizer

Question

I'm trying to fine-tune a DialoGPT model on a new dataset. I already processed my data correctly and adding a new padding token in the tokenizer didn't seem to make any issue :

#my dataset : 
print(dataset)
print(dataset[0]['text'])

output

Dataset({ features: ['text'], num_rows: 48423 })

[speaker 1]: Great that you wish to hear the voices of the guitarists. Here are your booking details of the tickets. You wish to purchase 4 tickets for the event The Original Wailers that is going to take place on March 8th in Berkeley, right? [speaker 2]: Yup, you're right. Please May I know where is the event conducted and I need the complete address? [speaker 1]: Please note down the complete address of the event happening. It's at Cornerstone Craft Beer & Live Music, 2367 Shattuck Avenue. Your reservation is successful and have a great time there! [speaker 2]: Thanks much for the information you've given. Please can you help me to find some intermediate priced restaurant that provides Ethiopian kind of food. [speaker 1]: Yup! There is an Ethiopian Restaurant named Addis Restaurant providing excellent and authentic traditional Ethiopian cuisine located in Berkeley. Do you wish to reserve a table here? [speaker 2]:

#tokenizing and adding labels
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
def tokenize_function(examples):
    return tokenizer(examples["text"],  padding='max_length', add_special_tokens =True, max_length=246) #truncation=True, max_length=13)

tokenized_datasets = ds.map(
    tokenize_function, batched=True, num_proc=4, remove_columns=["text"]
)

tokenized_datasets = tokenized_datasets.add_column("labels", tokenized_datasets[:]['input_ids']) 

train_set = model.prepare_tf_dataset(
    tokenized_datasets,
    shuffle=True,
    batch_size=1,
)
sample = train_set.as_numpy_iterator()
sample = sample.next()

print(tokenized_datasets)
print(train_set)
print(sample)

output

Dataset({ features: ['input_ids', 'attention_mask', 'labels'], num_rows: 48423 })

<PrefetchDataset element_spec=({'input_ids': TensorSpec(shape=(1, 246), dtype=tf.int64, name=None), 'attention_mask': TensorSpec(shape=(1, 246), dtype=tf.int64, name=None)}, TensorSpec(shape=(1, 246), dtype=tf.int64, name=None))>

({'attention_mask': array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'input_ids': array([[ 58, 4125, 3110, 352, 5974, 314, 765, 284, 711, 440, 9190, 440, 14918, 440, 3825, 319, 616, 3359, 13, 198, 58, 4125, 3110, 362, 5974, 921, 765, 284, 3350, 262, 3496, 440, 9190, 440, 14918, 440, 3825, 4291, 262, 3195, 11, 826, 30, 198, 58, 4125, 3110, 352, 5974, 1320, 318, 826, 13, 1867, 2099, 286, 3496, 318, 340, 30, 198, 58, 4125, 3110, 362, 5974, 632, 318, 5610, 739, 262, 12136, 6536, 290, 534, 3496, 468, 2067, 13, 198, 58, 4125, 3110, 352, 5974, 20558, 617, 1637, 329, 502, 13, 198, 58, 4125, 3110, 362, 5974, 220, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257]])}, array([[ 58, 4125, 3110, 352, 5974, 314, 765, 284, 711, 440, 9190, 440, 14918, 440, 3825, 319, 616, 3359, 13, 198, 58, 4125, 3110, 362, 5974, 921, 765, 284, 3350, 262, 3496, 440, 9190, 440, 14918, 440, 3825, 4291, 262, 3195, 11, 826, 30, 198, 58, 4125, 3110, 352, 5974, 1320, 318, 826, 13, 1867, 2099, 286, 3496, 318, 340, 30, 198, 58, 4125, 3110, 362, 5974, 632, 318, 5610, 739, 262, 12136, 6536, 290, 534, 3496, 468, 2067, 13, 198, 58, 4125, 3110, 352, 5974, 20558, 617, 1637, 329, 502, 13, 198, 58, 4125, 3110, 362, 5974, 220, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257, 50257]]))

The ouputs so far seem pretty clean for me. But when I try to make a prediction with my model or train it I have nan values as output :

#Instatiation of model 
from transformers import TFAutoModelForCausalLM
model = TFAutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

optimizer = AdamWeightDecay(learning_rate=1e-9, weight_decay_rate=0.01)
model.compile(optimizer=optimizer, jit_compile=True)

#model inference
loss = model(sample[0], labels=sample[1])
print(loss)

output

TFCausalLMOutputWithCrossAttentions([('loss', <tf.Tensor: shape=(1,), dtype=float32, numpy=array([nan], dtype=float32)>), ('logits', <tf.Tensor: shape=(1, 246, 50258), dtype=float32, numpy= array([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)>), ('past_key_values', (<tf.Tensor: shape=(2, 1, 16, 246, 64), dtype=float32, numpy= array([[[[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]],
                                            [[nan, nan, nan, ..., nan, nan, nan],
                                             [nan, nan, nan, ..., nan, nan, nan],
                                             [nan, nan, nan, ..., nan, nan, nan],
                                             ...,
                                             [nan, nan, nan, ..., nan, nan, nan],
                                             [nan, nan, nan, ..., nan, nan, nan],
                                             [nan, nan, nan, ..., nan, nan, nan]],
                                             .............

#model training
model.fit(train_set, epochs=1)

output

56/48423 [..............................] - ETA: 2:27:49 - loss: nan

This NAN value is certainly caused by the new token '[PAD]' added but I don't know how to deal with it. Can someone help me please ?

NAN values appears when including a new padding token in my tokenizer

output

output

output

output

0 Answers0