0

I'm using Diffusion (not limited in Diffusion) to generate a number of images, let's say 256. Normally, it will takes around one minute to generate one sample, so I create a Noise dataset (like below) and convert to a Dataloader:

class NoiseDataset(Dataset):
    def __init__(
        self, nrows: int = 64, img_file: str = "../data/noisy_sample.npy"
    ) -> None:
        self.noise_tensor = torch.from_numpy(np.load(img_file)[:nrows])
        self.noise_label = torch.tensor(range(len(self.noise_tensor)))

    def __len__(self) -> int:
        return len(self.noise_tensor)

    def __getitem__(self, idx: slice) -> Tuple[torch.Tensor, torch.Tensor]:
        return self.noise_tensor[idx], self.noise_label[idx]

Now I'm able to generate sample using multiple GPU and DDP. (Please share if you have better solution) However, I found if I change batch_size of DataLoader (and set shuffle to False), the output sample will change, means the process is not "reproducible" (If I rerun the program with the same batch size, the output are the same.).

I have set the random seed like this:

def reproducibility() -> None:
    torch.manual_seed(0)
    random.seed(0)
    np.random.seed(0)

But the problem still. Is there anything I did wrong?

Carl
  • 109
  • 8

0 Answers0