I'm using Diffusion (not limited in Diffusion) to generate a number of images, let's say 256. Normally, it will takes around one minute to generate one sample, so I create a Noise dataset (like below) and convert to a Dataloader
:
class NoiseDataset(Dataset):
def __init__(
self, nrows: int = 64, img_file: str = "../data/noisy_sample.npy"
) -> None:
self.noise_tensor = torch.from_numpy(np.load(img_file)[:nrows])
self.noise_label = torch.tensor(range(len(self.noise_tensor)))
def __len__(self) -> int:
return len(self.noise_tensor)
def __getitem__(self, idx: slice) -> Tuple[torch.Tensor, torch.Tensor]:
return self.noise_tensor[idx], self.noise_label[idx]
Now I'm able to generate sample using multiple GPU and DDP. (Please share if you have better solution) However, I found if I change batch_size of DataLoader
(and set shuffle
to False
), the output sample will change, means the process is not "reproducible" (If I rerun the program with the same batch size, the output are the same.).
I have set the random seed like this:
def reproducibility() -> None:
torch.manual_seed(0)
random.seed(0)
np.random.seed(0)
But the problem still. Is there anything I did wrong?