2

I have two dataloaders and I would like to merge them without redefining the datasets, in my case train_dataset and val_dataset.

train_loader = DataLoader(train_dataset, batch_size = 512, drop_last=True,shuffle=True)
val_loader = DataLoader(val_dataset, batch_size = 512, drop_last=False)

Wanted result:

train_loader = train_loader + val_loader 
Johnpiton
  • 41
  • 1
  • 1
  • 6
  • how is your question different to: https://stackoverflow.com/questions/60840500/pytorch-concatenating-datasets-before-using-dataloader? – Charlie Parker Sep 27 '22 at 01:29
  • easiest solution to what I want is to do use this: https://discuss.pytorch.org/t/does-concatenate-datasets-preserve-class-labels-and-indices/62611/12?u=brando_miranda by using learn2learn's union of data sets. – Charlie Parker Sep 27 '22 at 02:00
  • useful: https://stackoverflow.com/questions/69792591/combing-two-torchvision-dataset-objects-into-a-single-dataloader-in-pytorch?noredirect=1#comment130421381_69792591 – Charlie Parker Sep 27 '22 at 02:08

3 Answers3

6

Data loaders are iterators, you can implement a function that returns an iterator which yields the dataloaders' content, one dataloader after the other.

Given a number of iterators itrs, it would iterate over each iterator and in turn iterate over each iterator yielding one batch at a time. A possible implementation would be as simple as:

def itr_merge(*itrs):
    for itr in itrs:
        for v in itr:
            yield v

Here is an usage example:

>>> dl1 = DataLoader(TensorDataset(torch.zeros(5, 1)), batch_size=2, drop_last=True)
>>> dl2 = DataLoader(TensorDataset(torch.ones(10, 1)), batch_size=2)

>>> for x in itr_merge(dl1, dl2):
>>>   print(x)
[tensor([[0.], [0.]])]
[tensor([[0.], [0.]])]
[tensor([[1.], [1.]])]
[tensor([[1.], [1.]])]
[tensor([[1.], [1.]])]
[tensor([[1.], [1.]])]
[tensor([[1.], [1.]])]
Ivan
  • 34,531
  • 8
  • 55
  • 100
4

There is a ConcatDataset available, documented in https://pytorch.org/docs/stable/_modules/torch/utils/data/dataset.html#ConcatDataset. You could concatenate the datasets before passing them to the DataLoader

import torch
from torch.utils.data import TensorDataset, ConcatDataset, DataLoader
dsa = TensorDataset(torch.rand(100, 3), torch.rand(100, 1) )
dsb = TensorDataset(torch.rand(150, 3), torch.rand(150, 1) )

dsab_cat = ConcatDataset([dsa, dsb])
dsab_cat_loader = DataLoader(dsab_cat)

refs: https://www.oreilly.com/library/view/deep-learning-with/9781789534092/5f2cf6d8-4cdf-4e83-8c5b-58fbf722f6b6.xhtml

bluesmonk
  • 1,237
  • 13
  • 31
  • I'd use the index produced by `enumerate` when iterating over the dataset and use that index as a class. But it's hard to tell what your use case is. I'd suggest posting a new question with a self-contained example. – bluesmonk Sep 27 '22 at 14:52
  • that is what I was going to do. – Charlie Parker Sep 27 '22 at 17:55
1

Returns a list of tensors that you could iterate for training like how you usually iterate a DataLoader:

trainval = [d for dl in [train_loader, val_loader] for d in dl]
fiesaratnu
  • 11
  • 1