I have some image data for a binary classification task and the images are organised into 2 folders as data/model_data/class-A and data/model_data/class-B.
There are a total of N images. I want to have a 70/20/10 split for train/val/test. I am using PyTorch and Torchvision for the task. Here is the code I have so far.
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils, datasets, models
data_transform = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
model_dataset = datasets.ImageFolder(root, transform=data_transform)
train_count = int(0.7 * total_count)
valid_count = int(0.2 * total_count)
test_count = total_count - train_count - valid_count
train_dataset, valid_dataset, test_dataset = torch.utils.data.random_split(model_dataset, (train_count, valid_count, test_count))
train_dataset_loader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=NUM_WORKER)
valid_dataset_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=NUM_WORKER)
test_dataset_loader = torch.utils.data.DataLoader(test_dataset , batch_size=BATCH_SIZE, shuffle=False,num_workers=NUM_WORKER)
dataloaders = {'train': train_dataset_loader, 'val': valid_dataset_loader, 'test': test_dataset_loader}
I feel that this isn't the correct way to be doing this because of 2 reasons.
- I am applying the same transform to all the splits. (This is not what I want to do, obviously! The solution for this is most probably the answer here.)
- Usually people first separate the original data into test/train and then they separate train into train/val, whereas I am directly separating the original data into train/val/test. (Is this correct?)
So, my question is, is what I am doing correct? (Probably not)
And if it is not correct, how do I go about writing the data loaders to achieve the required splits, so that I can apply separate transforms to each of train/test/val?