-2

With SGD learning rate should not be changed during epochs but it is. Help me understand why it happens please and how to prevent this LR changing?

import torch
params = [torch.nn.Parameter(torch.randn(1, 1))]
optimizer = torch.optim.SGD(params, lr=0.9)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1, gamma=0.9)
for epoch in range(5):
    print(scheduler.get_lr())
    scheduler.step()

Output is:

[0.9]
[0.7290000000000001]
[0.6561000000000001]
[0.5904900000000002]
[0.5314410000000002]

My torch version is 1.4.0

Alex
  • 71
  • 1
  • 6

3 Answers3

1

Since you are using the command torch.optim.lr_scheduler.StepLR(optimizer, 1, gamma=0.9) (meaning actually torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.9)) thus you are multiplying the learning rate by gamma=0.9 every step_size=1 step:

  • 0.9 = 0.9
  • 0.729 = 0.9*0.9*0.9
  • 0.6561 = 0.9*0.9*0.9*0.9
  • 0.59049 = 0.9*0.9*0.9*0.9*0.9

The only "strange" point is that it missing 0.81=0.9*0.9 at the second step (UPDATE: see Szymon Maszke answer for an explanation)

To prevent early decreasing, if you have N samples in your dataset, and the batch size is D, then set torch.optim.lr_scheduler.StepLR(optimizer, step_size=N/D, gamma=0.9) to decrease at each epoch. To decrease each E epoch set torch.optim.lr_scheduler.StepLR(optimizer, step_size=E*N/D, gamma=0.9)

xiawi
  • 1,772
  • 4
  • 19
  • 21
1

This is just what torch.optim.lr_scheduler.StepLR is supposed to do. It changes the learning rate. From the pytorch documentation:

Decays the learning rate of each parameter group by gamma every step_size epochs. Notice that such decay can happen simultaneously with other changes to the learning rate from outside this scheduler. When last_epoch=-1, sets initial lr as lr

If you are trying to optimize params, your code should look more like this (just a toy example, the precise form of loss will depend on your application)

for epoch in range(5):
  optimizer.zero_grad()
  loss = (params[0]**2).sum()
  loss.backward()
  optimizer.step()
joemrt
  • 161
  • 5
1

To expand upon xiawi's answer about "strange" behavior (0.81 is missing): It is PyTorch's default way since 1.1.0 release, check documentation, namely this part:

[...] If you use the learning rate scheduler (calling scheduler.step()) before the optimizer’s update (calling optimizer.step()), this will skip the first value of the learning rate schedule.

Additionally you should get a UserWarning thrown by this function after the first get_lr() call as you do not call optimizer.step() at all.

Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83