1

I'm trying to create my own Dataloader from a custom dataset for a CNN. The original Dataloader was created by writing:

train_loader = torch.utils.data.DataLoader(mnist_data, batch_size=64)

If I check the shape of the above, I get

i1, l1 = next(iter(train_loader))
print(i1.shape)   # torch.Size([64, 1, 28, 28]) 
print(l1.shape)   # torch.Size([64]) 

When I feed this train_loader into my CNN, it works beautifully. However, I have a custom dataset. I have done the following:

mnist_data = datasets.MNIST('data', train=True, download=True, transform=transforms.ToTensor())

trainset = mnist_data
testset = mnist_data

x_train = np.array(trainset.data)
y_train = np.array(trainset.targets)

# modify x_train/y_train

Now, how would I be able to take x_train, y_train and make it into a Dataloader similar to the first one? I have done the following:

train_data = []
for i in range(len(x_train)):
   train_data.append([x_train[i], y_train[i]])

train_loader = torch.utils.data.DataLoader(train_data, batch_size=64)

for i, (images, labels) in enumerate(train_loader):
    images = images.unsqueeze(1)

However, I'm still missing the channel column (which should be 1). How would I fix this?

user545642
  • 91
  • 1
  • 14
  • what is the format of your annotations/labels? Is it a single class classification task where are your in respective folder? for example if it is mnist you have all images with number "1" in folder names as "1" ? – Sadra Mar 13 '22 at 00:12
  • No, I don't. It's just the same as the mnist dataset, I just removed a bunch of the ones from x_train and y_train, that's it. So shrunk the size of the input. – user545642 Mar 13 '22 at 00:42
  • I think it is not a standard approach to use another off-the-shelf dataset functions ( `datasets.MNIST("data" ...` ) for your own dataset. if you describe how is your data (in folder/on kind of numpy array or any other ways) probably we can propose a kind of standard approach for you to train on your data. if it is opensource point out to the its repo, if not describe how it is. – Sadra Mar 13 '22 at 00:47
  • I'm required to use that. All I want to know is how to convert x_train and y_train into a Dataloader, including the channel dimension. – user545642 Mar 13 '22 at 00:51

1 Answers1

2

I don't have access to your x_train and y_train, but probably this works:

from torch.utils.data import TensorDataset, DataLoader

# use x_train and y_train as numpy array without further modification
x_train = np.array(trainset.data)
y_train = np.array(trainset.targets)

# convert to numpys to tensor
tensor_x = torch.Tensor(x_train) 
tensor_y = torch.Tensor(y_train)
# create the dataset
custom_dataset = TensorDataset(tensor_x,tensor_y) 
# create your dataloader
my_dataloader = DataLoader(custom_dataset,batch_size=1) 

#check if you can get the desired things
i1, l1 = next(iter(my_dataloader))
print(i1.shape)   # torch.Size([1, 1, 28, 28]) 
print(l1.shape)   # torch.Size([1]) 
Sadra
  • 2,480
  • 2
  • 20
  • 32