I'm creating a custom dataset for NLP-related tasks.
In the PyTorch custom datast tutorial, we see that the __getitem__()
method leaves room for a transform before it returns a sample:
def __getitem__(self, idx):
if torch.is_tensor(idx):
idx = idx.tolist()
img_name = os.path.join(self.root_dir,
self.landmarks_frame.iloc[idx, 0])
image = io.imread(img_name)
### SOME DATA MANIPULATION HERE ###
sample = {'image': image, 'landmarks': landmarks}
if self.transform:
sample = self.transform(sample)
return sample
However, the code here:
if torch.is_tensor(idx):
idx = idx.tolist()
implies that multiple items should be able to be retrieved at a time which leaves me wondering:
How does that transform work on multiple items? Take the custom transforms in the tutorial for example. They do not look like they could be applied to a batch of samples in a single call.
Related, how does a DataLoader retrieve a batch of multiple samples in parallel and apply said transform if the transform can only be applied to a single sample?