Role that learning_rate plays in the reproducibility of the model in PyTorch models

Question

I have a Bayesian neural netowrk which is implemented in PyTorch and is trained via a ELBO loss. I have faced some reproducibility issues even when I have the same seed and I set the following code:

# python
seed = args.seed
random.seed(seed)
logging.info("Python seed: %i" % seed)
# numpy
seed += 1
np.random.seed(seed)
logging.info("Numpy seed: %i" % seed)
# torch
seed += 1
torch.manual_seed(seed)
logging.info("Torch CPU seed: %i" % seed)
# torch cuda
seed += 1
torch.cuda.manual_seed_all(seed)
torch.cuda.manual_seed(seed)
logging.info("Torch CUDA seed: %i" % seed)

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

I need to add that I use XE loss and this is not a deterministic loss in PyTorch. This is the only possible source of randomness I am aware of. What I have observed is that, when I use a large learning_rate (=0.1), I cannot reproduce my results and I see huge gaps. However, when the learning_rate is reduced by a factor of 10 (=0.01), I see that the gap disappears. My intuition is that the culprit here is the non-deterministic loss and the large lr is just a catalyzer. What do you think? I appreciate any hints and intuitions.

Role that learning_rate plays in the reproducibility of the model in PyTorch models

0 Answers0