KeyError: 'eval_loss' in Hugginface Trainer

Question

I am trying to build a Question Answering Pipeline with the Hugginface framework but facing the KeyError: 'eval_loss' error. My goal is to train and save the best model at last and evaluate the validation test on the loaded model. My trainer configuration looks like this:

args = TrainingArguments(f'model_training',
                      evaluation_strategy="epoch",
                      label_names = ["start_positions", "end_positions"],
                      logging_steps = 1,
                      learning_rate=2e-5,
                      num_train_epochs=epochs,
                      save_total_limit = 2,
                      load_best_model_at_end=True,
                      save_strategy="epoch",
                      logging_strategy="epoch",
                      report_to="none",
                      weight_decay=0.01,
                      fp16=True,
                      push_to_hub=False)

While training, getting this error:

Traceback (most recent call last):
  File "qa_pipe.py", line 286, in <module>
    pipe.training(train_d, val_d, epochs = 2)
  File "qa_pipe.py", line 263, in training
    self.trainer.train()
  File "/home/admin/qa/lib/python3.7/site-packages/transformers/trainer.py", line 1505, in train
    ignore_keys_for_eval=ignore_keys_for_eval,
  File "/home/admin/qa/lib/python3.7/site-packages/transformers/trainer.py", line 1838, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/home/admin/qa/lib/python3.7/site-packages/transformers/trainer.py", line 2090, in _maybe_log_save_evaluate
    self._save_checkpoint(model, trial, metrics=metrics)
  File "/home/admin/qa/lib/python3.7/site-packages/transformers/trainer.py", line 2193, in _save_checkpoint
    metric_value = metrics[metric_to_check]
KeyError: 'eval_loss'

The minimal working example is provided on colab

How to avoid this error and save the best model at last?

Alexander Krauck · Answer 1 · 2023-06-12T21:35:17.283

1

See the prediction_step function of the Trainer class:

On a high level, it checks if either your input to the model (the thing the data collator returns) contains "labels" which should be the targets to your prediction. Alternatively it checks if your input contains a key "return_loss".

If you have labels or "return_loss" = True, the function will compute the desired loss and return it properly, otherwise it will return None for the loss.

I see in your code that you are using the library only high level so it might not be so helpful for you but I suppose the easiest fix is creating a custom data collator that adds the entry "return_loss" = True to the input dict.

edited Jun 12 '23 at 21:35

answered Jun 12 '23 at 21:34

Alexander Krauck

11
2

1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 13 '23 at 06:36

KeyError: 'eval_loss' in Hugginface Trainer

1 Answers1