I'm training a neural network for a computer vision-based task. For the optimizer, I found out that it isn't ideal to use a single learning rate for the entire training, and what people do is that they use learning rate schedulers to decay the learning rate in a specific manner. So to do this, I tried out PyTorch's
CosineAnnealingWarmRestarts().
What this does is that it anneals/decreases the initial learning rate (set by us) in a cosine manner until it hits a restart. After this "restart," the learning rate is set back to the initial learning rate, and the cycle happens again. This worked pretty well for me, but I wanted to make a few changes in it. I wanted to change the learning rate, the optimizer is assigned after each restart, so that after every restart the maximum learning rate for the optimizer also decreases. Can this be done in PyTorch?
Asked
Active
Viewed 1,097 times
1

cronin
- 83
- 1
- 1
- 6
2 Answers
1
It seems to me that a straight-forward solution would just be to inherit from CosineAnnealingWarmRestarts
and then change its self.optimizer
parameters inside an overriden step
function. In pseudo-code, that would be something like
class myScheduler(torch.optim.lr_scheduler.CosineAnnealingWarmRestarts):
def __init__(self,
optimizer,
T_0,
T_mult=1,
eta_min=0,
last_epoch=-1):
#initialize base calss
super().__init__(.... blablabla ...)
def step(self):
#call step() from base class
#Do some book-keeping to determine if you've hit a restart
#now change optimizer lr for each parameter group
if some_condition:#condition like number of iterations, restarts, etc
self.optimizer.param_groups[i]['lr']*=some_coef

Ash
- 4,611
- 6
- 27
- 41
0
I have run into the same issue and solved it according to Ashs recommendation, inheriting from CosineAnnealingWarmRestarts
and decaying η max
at the end of a run:
import math
from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts
class CosineAnnealingWarmRestartsDecay(CosineAnnealingWarmRestarts):
def __init__(self, optimizer, T_0, T_mult=1,
eta_min=0, last_epoch=-1, verbose=False, decay=1):
super().__init__(optimizer, T_0, T_mult=T_mult,
eta_min=eta_min, last_epoch=last_epoch, verbose=verbose)
self.decay = decay
self.initial_lrs = self.base_lrs
def step(self, epoch=None):
if epoch == None:
if self.T_cur + 1 == self.T_i:
if self.verbose:
print("multiplying base_lrs by {:.4f}".format(self.decay))
self.base_lrs = [base_lr * self.decay for base_lr in self.base_lrs]
else:
if epoch < 0:
raise ValueError("Expected non-negative epoch, but got {}".format(epoch))
if epoch >= self.T_0:
if self.T_mult == 1:
n = int(epoch / self.T_0)
else:
n = int(math.log((epoch / self.T_0 * (self.T_mult - 1) + 1), self.T_mult))
else:
n = 0
self.base_lrs = [initial_lrs * (self.decay**n) for initial_lrs in self.initial_lrs]
super().step(epoch)

Noxel
- 1
- 1