Dealing with infs in Seq2Seq Trainer

Question

I am trying to fine tune a hugging face model onto a Shell Code dataset (https://huggingface.co/datasets/SoLID/shellcode_i_a32)

The training code is a basic hugging face trainer method but we keep running into nan/inf issues

from transformers import PreTrainedTokenizerFast

tokenizer = PreTrainedTokenizerFast(tokenizer_file="tkn1.json", padding_side="right") 
special_tokens={'pad_token': "[PAD]"}

tokenizer.add_special_tokens(special_tokens)

#  token_wrap = PreTrainedTokenizer()
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

training_args = Seq2SeqTrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    lr_scheduler_type = "cosine",
    weight_decay=0.01,
    save_total_limit=3,
    per_device_train_batch_size=128,
    num_train_epochs=5,
    warmup_ratio=0.06,
    learning_rate=1.0e-04,
    # fp16=True,
    debug=["underflow_overflow"]
)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["test"],
    eval_dataset=tokenized_datasets["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
)
# trainer.train()
# print(tokenizer.)
trainer.train()
# eval_loss = trainer.evaluate()
# print(f">>> Perplexity: {math.exp(eval_loss['eval_loss']):.2f}")

The outputs look like -

You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Detected inf/nan during batch_number=0
Last 1 forward frames:
abs min  abs max  metadata
                  shared Embedding
5.42e-06 2.04e+04 weight
0.00e+00 1.46e+03 input[0]
1.56e-03 2.04e+04 output



---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-120-ff4a54906908> in <module>
     33 # trainer.train()
     34 # print(tokenizer.)
---> 35 trainer.train()
     36 # eval_loss = trainer.evaluate()
     37 # print(f">>> Perplexity: {math.exp(eval_loss['eval_loss']):.2f}")

9 frames

/usr/local/lib/python3.8/dist-packages/transformers/debug_utils.py in forward_hook(self, module, input, output)
    278 
    279             # now we can abort, as it's pointless to continue running
--> 280             raise ValueError(
    281                 "DebugUnderflowOverflow: inf/nan detected, aborting as there is no point running further. "
    282                 "Please scroll up above this traceback to see the activation values prior to this event."

ValueError: DebugUnderflowOverflow: inf/nan detected, aborting as there is no point running further. Please scroll up above this traceback to see the activation values prior to this event.

The very first layer seems to start throwing inf/nans when we start training and doesn't go much beyond that

We have tried tweaking our training arguments but have hit a brick wall here. Any help appreciated!

Dealing with infs in Seq2Seq Trainer

0 Answers0