1

I have a dataset that does not have separate folders for training and testing. I want to apply data augmentation with transforms only on the training data after doing the split

 train_data, valid_data = D.random_split(dataset, lengths=[train_size, valid_size])

Does anyone know how this can be achieved? I have a custom dataset with initialization and getitem. The training and validation datasets are further passed to the DataLoader.

alice
  • 59
  • 1
  • 2
  • 7

1 Answers1

2

You can have a custom Dataset only for the transformations:

class TrDataset(Dataset):
  def __init__(self, base_dataset, transformations):
    super(TrDataset, self).__init__()
    self.base = base_dataset
    self.transformations = transformations

  def __len__(self):
    return len(self.base)

  def __getitem__(self, idx):
    x, y = self.base[idx]
    return self.transformations(x), y

Once you have this Dataset wrapper, you can have different transformations for the train and validation sets:

raw_train_data, raw_valid_data = D.random_split(dataset, lengths=[train_size, valid_size])
train_data = TrDataset(raw_train_data, train_transforms)
valid_data = TrDataset(raw_valid_data, val_transforms)
Shai
  • 111,146
  • 38
  • 238
  • 371