Hard to tell without seeing your dataset/dataloader, but I suspect you're simply applying transformations to your dataset, this won't change the dataset size, just augment the existing images. If you wish to balance classes, adding a sampler seems the easiest solution.
Here's (somewhat simplified) code I use for this purpose, utilizing pandas, collections and torch.utils.data.WeightedRandomSampler
. Likely not the best out there but it gets the job done:
# Note the trasnformations should include ToTensor() in this case.
data = datasets.ImageFolder('/path/to/images', transform=loader_transform)
# Split into train/test sets:
train_len = int(len(data)*0.8)
train_set, test_set = random_split(data, [train_len, len(data) - train_len])
# Extract classes:
train_classes = [train_set.dataset.targets[i] for i in train_set.indices]
# Calculate support:
class_count = Counter(train_classes)
# Calculate class weights:
class_weights = torch.DoubleTensor([len(train_classes)/c for c in pd.Series(class_count).sort_index().values])
# Sampler needs the respective class weight supplied for each image in the dataset:
sample_weights = [class_weights[train_set.dataset.targets[i]] for i in train_set.indices]
sampler = WeightedRandomSampler(weights=sample_weights, num_samples=int(len(train_set)*2), replacement=True)
# Create torch dataloaders:
train_loader = DataLoader(train_set, batch_size=4, sampler=sampler, num_workers=12)
print("The number of images in a training set is:", len(train_loader)*batch_size)
test_loader = DataLoader(test_set, batch_size=4, shuffle=False, num_workers=12)
print("The number of images in a test set is:", len(test_loader)*batch_size)
Final train size will be 2x the original in this case, however you may experiment with smaller sizes too, the class representation will be balanced regardless of the size chosen.