I'm instantiating a pytorch data loader with shuffle=False
within a colab notebook (GPU runtime), like so:
image_data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)
When I iterate through the data loader, the batch order is fixed within a given colab session (i.e. if I reinstantiate the data loader and then iterate through the data loader within different cells, the batch order matches):
for batch_idx, (images, labels) in enumerate(image_data_loader): #…
However, when I use different runtimes, the batch order often varies. I believe the variation may occur when the device type is different (e.g. a Tesla V100 versus P100, per nvidia-smi
), but I haven't tested extensively enough to know whether it always comes down to device type.
I am using pytorch 1.9.0+cu102 and CUDA 11.2. Varying the random seed (torch.manual_seed(random_seed)
, torch.cuda.manual_seed(random_seed)
) does not change the batch order of the data loader on a given device -- suggesting that it is not possible to fix the order across devices using a random seed.
Can pytorch data loaders with shuffle=False
indeed be instantiated with different batch orders? If so, is there a way to fix the batch order across devices?