0

I observed a strange behavior of the final Accuracy when I run exactly the same experiment (the same code for training neural net for image classification) with the same random seed on different GPUs (machines). I use only one GPU. Precisely, When I run the experiment on one machine_1 the Accuracy is 86,37. When I run the experiment on machine_2 the Accuracy is 88,0. There is no variability when I run the experiment multiple times on the same machine. PyTorch and CUDA versions are the same. Could you help me to figure out the reason and fix it?

Machine_1: NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2

Machine_2: NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2

To fix random seed I use the following code:

random.seed(args.seed)
os.environ['PYTHONHASHSEED'] = str(args.seed)
np.random.seed(args.seed)
torch.manual_seed(args.seed)
torch.cuda.manual_seed(args.seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
  • 1
    Does this answer your question? [Training PyTorch models on different machines leads to different results](https://stackoverflow.com/questions/67511658/training-pytorch-models-on-different-machines-leads-to-different-results) – iacob May 18 '21 at 07:47

1 Answers1

2

This is what I use:

import torch
import os
import numpy as np
import random

def set_seed(seed):
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)

set_seed(13)

Make sure you have a single function that set's the seeds from once. If you are using Jupyter notebooks cell execution timing may cause this. Also the order of functions inside may be important. I never had problems with this code. You may call set_seed() often in code.

prosti
  • 42,291
  • 14
  • 186
  • 151