Modifying the Learning Rate in the middle of the Model Training in Deep Learning

Question

Below is the code to configure TrainingArguments consumed from the HuggingFace transformers library to finetune the GPT2 language model.

training_args = TrainingArguments(
        output_dir="./gpt2-language-model", #The output directory
        num_train_epochs=100, # number of training epochs
        per_device_train_batch_size=8, # batch size for training #32, 10
        per_device_eval_batch_size=8,  # batch size for evaluation #64, 10
        save_steps=100, # after # steps model is saved
        warmup_steps=500,# number of warmup steps for learning rate scheduler
        prediction_loss_only=True,
        metric_for_best_model = "eval_loss",
        load_best_model_at_end = True,
        evaluation_strategy="epoch",
        learning_rate=0.00004, # learning rate
    )

early_stop_callback = EarlyStoppingCallback(early_stopping_patience  = 3)
    
trainer = Trainer(
        model=gpt2_model,
        args=training_args,
        data_collator=data_collator,
        train_dataset=train_dataset,
        eval_dataset=test_dataset,
        callbacks = [early_stop_callback],
 )

The number of epochs as 100 and learning_rate as 0.00004 and also the early_stopping is configured with the patience value as 3.

The model ran for 5/100 epochs and noticed that the difference in loss_value is negligible. The latest checkpoint is saved as checkpoint-latest.

Now Can I modify the learning_rate may be to 0.01 from 0.00004 and resume the training from the latest saved checkpoint - checkpoint-latest? Doing that will be efficient?

Or to train with the new learning_rate value should I start the training from the beginning?

I have a similar situation, where the model is converging but *very* slowly, so I'd like to try continuing from a checkpoint with a higher learning rate. I'm also using the Trainer, which wraps things up in a way that makes it less obvious to me how to adjust the lr. How did you wind up doing it? — jbm, Sep 24 '21 at 01:35

score 3 · Accepted Answer · answered Feb 01 '21 at 10:30

No, you don't have to restart your training.

Changing the learning rate is like changing how big a step your model take in the direction determined by your loss function.

You can also think of it as transfer learning where the model has some experience (no matter how little or irrelevant) and the weights are in a state most likely better than a randomly initialised one.

As a matter of fact, changing the learning rate mid-training is considered an art in deep learning and you should change it if you have a very very good reason to do it.

You would probably want to write down when (why, what, etc) you did it if you or someone else wants to "reproduce" the result of your model.

score 0 · Answer 2 · answered Feb 01 '21 at 06:18

0

Pytorch provides several methods to adjust the learning_rate: torch.optim.lr_scheduler. Check the docs for usage https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

answered Feb 01 '21 at 06:18

eniola sonowo

1

the question is related to modifying the learning_rate in the middle of the training. – Woody Feb 01 '21 at 07:01

Modifying the Learning Rate in the middle of the Model Training in Deep Learning

2 Answers2