2

I'm instantiating a pytorch data loader with shuffle=False within a colab notebook (GPU runtime), like so:

image_data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)

When I iterate through the data loader, the batch order is fixed within a given colab session (i.e. if I reinstantiate the data loader and then iterate through the data loader within different cells, the batch order matches):

for batch_idx, (images, labels) in enumerate(image_data_loader): #…

However, when I use different runtimes, the batch order often varies. I believe the variation may occur when the device type is different (e.g. a Tesla V100 versus P100, per nvidia-smi), but I haven't tested extensively enough to know whether it always comes down to device type.

I am using pytorch 1.9.0+cu102 and CUDA 11.2. Varying the random seed (torch.manual_seed(random_seed), torch.cuda.manual_seed(random_seed)) does not change the batch order of the data loader on a given device -- suggesting that it is not possible to fix the order across devices using a random seed.

Can pytorch data loaders with shuffle=False indeed be instantiated with different batch orders? If so, is there a way to fix the batch order across devices?

0 Answers0