Following this medium post, I understand how to save and load my model (or at least I think I do). They say the learning_rate is saved. However, looking at this person's code (it's a github repo with lots of people watching, forking, etc. so I'm assuming it shouldn't be filled with mistakes), the person writes:
def load_checkpoint(checkpoint_file, model, optimizer, lr):
print("=> Loading checkpoint")
checkpoint = torch.load(checkpoint_file, map_location=config.DEVICE)
model.load_state_dict(checkpoint["state_dict"])
optimizer.load_state_dict(checkpoint["optimizer"])
# If we don't do this then it will just have learning rate of old checkpoint
# and it will lead to many hours of debugging \:
for param_group in optimizer.param_groups:
param_group["lr"] = lr
Why doesn't optimizer.load_state_dict(checkpoint["optimizer"])
give the learning rate of old checkpoint. If so (I believe it does), why do they say it's a problem If we don't do this then it will just have learning rate of old checkpoint and it will lead to many hours of debugging
.
There is no learning rate decay anyway in the code. So should it even matter?