Decrease the maximum learning rate after every restart

Question

I'm training a neural network for a computer vision-based task. For the optimizer, I found out that it isn't ideal to use a single learning rate for the entire training, and what people do is that they use learning rate schedulers to decay the learning rate in a specific manner. So to do this, I tried out PyTorch's CosineAnnealingWarmRestarts(). What this does is that it anneals/decreases the initial learning rate (set by us) in a cosine manner until it hits a restart. After this "restart," the learning rate is set back to the initial learning rate, and the cycle happens again. This worked pretty well for me, but I wanted to make a few changes in it. I wanted to change the learning rate, the optimizer is assigned after each restart, so that after every restart the maximum learning rate for the optimizer also decreases. Can this be done in PyTorch?

Ash · Answer 1 · 2020-06-17T11:43:50.713

It seems to me that a straight-forward solution would just be to inherit from CosineAnnealingWarmRestarts and then change its self.optimizer parameters inside an overriden step function. In pseudo-code, that would be something like

class myScheduler(torch.optim.lr_scheduler.CosineAnnealingWarmRestarts):
    def __init__(self,
            optimizer, 
            T_0, 
            T_mult=1,
            eta_min=0,
            last_epoch=-1):
    #initialize base calss
    super().__init__(.... blablabla ...)

    def step(self):
        #call step() from base class
        #Do some book-keeping to determine if you've hit a restart 
        #now change optimizer lr for each parameter group
        if some_condition:#condition like number of iterations, restarts, etc
            self.optimizer.param_groups[i]['lr']*=some_coef

Noxel · Answer 2 · 2022-09-19T11:21:44.450

I have run into the same issue and solved it according to Ashs recommendation, inheriting from CosineAnnealingWarmRestarts and decaying η max at the end of a run:

import math
from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts

class CosineAnnealingWarmRestartsDecay(CosineAnnealingWarmRestarts):
    def __init__(self, optimizer, T_0, T_mult=1,
                    eta_min=0, last_epoch=-1, verbose=False, decay=1):
        super().__init__(optimizer, T_0, T_mult=T_mult,
                            eta_min=eta_min, last_epoch=last_epoch, verbose=verbose)
        self.decay = decay
        self.initial_lrs = self.base_lrs
    
    def step(self, epoch=None):
        if epoch == None:
            if self.T_cur + 1 == self.T_i:
                if self.verbose:
                    print("multiplying base_lrs by {:.4f}".format(self.decay))
                self.base_lrs = [base_lr * self.decay for base_lr in self.base_lrs]
        else:
            if epoch < 0:
                raise ValueError("Expected non-negative epoch, but got {}".format(epoch))
            if epoch >= self.T_0:
                if self.T_mult == 1:
                    n = int(epoch / self.T_0)
                else:
                    n = int(math.log((epoch / self.T_0 * (self.T_mult - 1) + 1), self.T_mult))
            else:
                n = 0
            
            self.base_lrs = [initial_lrs * (self.decay**n) for initial_lrs in self.initial_lrs]

        super().step(epoch)

Decrease the maximum learning rate after every restart

2 Answers2