Why does by torch.optim.SGD method learning rate change?

Question

With SGD learning rate should not be changed during epochs but it is. Help me understand why it happens please and how to prevent this LR changing?

import torch
params = [torch.nn.Parameter(torch.randn(1, 1))]
optimizer = torch.optim.SGD(params, lr=0.9)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1, gamma=0.9)
for epoch in range(5):
    print(scheduler.get_lr())
    scheduler.step()

Output is:

[0.9]
[0.7290000000000001]
[0.6561000000000001]
[0.5904900000000002]
[0.5314410000000002]

My torch version is 1.4.0

xiawi · Accepted Answer · 2020-05-05T10:47:50.033

1

Since you are using the command torch.optim.lr_scheduler.StepLR(optimizer, 1, gamma=0.9) (meaning actually torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.9)) thus you are multiplying the learning rate by gamma=0.9 every step_size=1 step:

0.9 = 0.9
0.729 = 0.9*0.9*0.9
0.6561 = 0.9*0.9*0.9*0.9
0.59049 = 0.9*0.9*0.9*0.9*0.9

The only "strange" point is that it missing 0.81=0.9*0.9 at the second step (UPDATE: see Szymon Maszke answer for an explanation)

To prevent early decreasing, if you have N samples in your dataset, and the batch size is D, then set torch.optim.lr_scheduler.StepLR(optimizer, step_size=N/D, gamma=0.9) to decrease at each epoch. To decrease each E epoch set torch.optim.lr_scheduler.StepLR(optimizer, step_size=E*N/D, gamma=0.9)

edited May 05 '20 at 10:47

answered May 04 '20 at 19:05

xiawi

1,772
4
19
21

Great! Thank you for such precise description! – Alex May 04 '20 at 20:02
@Alex if the answer resolved your issue, kindly accept it - see [What should I do when someone answers my question?](https://stackoverflow.com/help/someone-answers) – Szymon Maszke May 04 '20 at 20:22
1

@xiawi See my answer about "stange" behavior or `optimizer` step without `0.81`. – Szymon Maszke May 04 '20 at 20:23

joemrt · Answer 2 · 2020-05-04T19:14:42.417

This is just what torch.optim.lr_scheduler.StepLR is supposed to do. It changes the learning rate. From the pytorch documentation:

Decays the learning rate of each parameter group by gamma every step_size epochs. Notice that such decay can happen simultaneously with other changes to the learning rate from outside this scheduler. When last_epoch=-1, sets initial lr as lr

If you are trying to optimize params, your code should look more like this (just a toy example, the precise form of loss will depend on your application)

for epoch in range(5):
  optimizer.zero_grad()
  loss = (params[0]**2).sum()
  loss.backward()
  optimizer.step()

Thank you @joemrt for your answer! – Alex May 04 '20 at 20:05 — Alex, May 04 '20 at 20:05

score 1 · Answer 3 · answered May 04 '20 at 20:21

To expand upon xiawi's answer about "strange" behavior (0.81 is missing): It is PyTorch's default way since 1.1.0 release, check documentation, namely this part:

[...] If you use the learning rate scheduler (calling scheduler.step()) before the optimizer’s update (calling optimizer.step()), this will skip the first value of the learning rate schedule.

Additionally you should get a UserWarning thrown by this function after the first get_lr() call as you do not call optimizer.step() at all.

Why does by torch.optim.SGD method learning rate change?

3 Answers3