0

I have a text dataset that I want to use for a GAN and it should turn to onehotencode and this is how I Creating a Custom Dataset for my files

class Dataset2(torch.utils.data.Dataset):
    def __init__(self, list_, labels):
        'Initialization'
        self.labels = labels
        self.list_IDs = list_

    def __len__(self):
        'Denotes the total number of samples'
        return len(self.list_IDs)

    def __getitem__(self, index):
        'Generates one sample of data'
        # Select sample
        mylist = self.list_IDs[index]

        # Load data and get label
        X = F.one_hot(mylist, num_classes=len(alphabet))
        y = self.labels[index]

        return X, y

It is working well and every time I call it, it works just fine but the problem is when I use DataLoader and try to use it, its shape is not the same as it just came out of the dataset, this is the shape that came out of the dataset

x , _ = dataset[1]
x.shape

torch.Size([1274, 22])

and this is the shape that came out dataloader

dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

one = []
for epoch in range(epochs):
    for i, (real_data, _) in enumerate(dataloader):
        one.append(real_data)
one[3].shape

torch.Size([4, 1274, 22])

this 4 is number of samples in my data but it should not be there, how can I fix this problem?

1 Answers1

1

You confirmed you only had four elements in your dataset. You have wrapped your dataset with a data loader with batch_size=64 which is greater than 4. This means the dataloader will only output a single batch containing 4 elements.

In turn, this means you only append a single element per epoch, and one[3].shape is a batch (the only batch of the data loader), shaped (4, 1274, 22).

Ivan
  • 34,531
  • 8
  • 55
  • 100
  • so if my `batch_size=64` is lower for example 1 my problem will be solved and if I want to have exactly (1274,22) shape, what should I do for that?? , thank you for your help – khashayar ehteshami Oct 10 '21 at 13:27
  • Indeed, `DataLoader(dataset, batch_size=1, shuffle=True)` will output four single-element batches: *i.e.* a shape `(1, 1274, 22)`. Consider upvoting/accepting the answer as valid (green checkmark on the left) if you found this answer helpful. – Ivan Oct 10 '21 at 13:33