How to perform preprocessing steps on image dataset once, so that it can be used for training and testing the model many times

Question

I am training different networks like VGG16, Resnet, Densenet, Squeezenet etc. on image dataset. I am performing following steps before training.

train_dataset = torchvision.datasets.ImageFolder(
    root=TRAIN_ROOT,
    transform=transforms.Compose([
                 transforms.Resize((224,224)),
                 transforms.ToTensor(),
                 transforms.Normalize(mean=[0.541, 0.536, 0.357],
                                      std=[0.321, 0.339, 0.441])
    ]) )

test_dataset = torchvision.datasets.ImageFolder(
    root=TEST_ROOT,
    transform=transforms.Compose([
                 transforms.Resize((224,224)),
                 transforms.ToTensor(),
                 transforms.Normalize(mean=[0.540, 0.536, 0.357],
                                      std=[0.321, 0.339, 0.441])
    ]) )

I need to perform training many times for performing different experiments. My dataset has 30000 images, and it takes lots of time to perform these steps every time. Can I perform these steps (resizing, converting to tensor and normalizing) once and store data in another train and test folder, so that I can directly use it for training and testing. If yes, how I can do that. (i.e. store preprocessed images and use directly for training and testing)

score 0 · Answer 1 · answered May 21 '23 at 21:26

You can save a tensor in the .pt format, but is it efficient? If you have large images that you resize to a smaller size, you can try it. However, if there is not a significant difference between the old and new shape, the memory usage will increase drastically because when you perform normalization, you lose the integer format. If you are using torch.utils.data.DataLoader, try increasing the num_workers so that it can process images in parallel.

How to perform preprocessing steps on image dataset once, so that it can be used for training and testing the model many times

1 Answers1