0

I have some data in .txt files and an instance formed by two lines which both have 100 elements in them. First line defines the problem and the second line defines the solution. Even though it is not a great idea I tried to use a supervised setting among the data. However, I am facing problems with batching. I have added the code for both the data loader and the main for loop that does the job.

The problem I get is that if I set the batch_size to 5 and preds array has the correct form. However, labels array has one more dimension and instead of having 5 integers in it, it has 5 complete problem solutions.

I believe the problem is in the data loader but couldn't solve it. I am kinda new to the concept, I have been trying to find this for over a week but nothing has settled so far.

Data Loader:

import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import pdb
import numpy as np
from torch.utils.data import Dataset

class load_dataset(Dataset):
    def __init__(self, data_file='data.txt', transform=None):
        super().__init__()
        data = np.loadtxt(data_file)
        data = torch.Tensor(data)
        self.data = data[::2]
        self.targets = data[1::2]

    def __len__(self):
        return len(self.targets)

    def __getitem__(self, index):
        adj, target = self.data[index], self.targets[index]
        return adj, target

Main Loop:

for inputs, labels in loaders["train"]:
    inputs, labels = inputs.view([batch_size, 100]), labels.data
    scores = mps(inputs)
    _, preds = torch.max(scores, 1)
    print("preds: ")
    print(preds)
    print("labels: ")
    print(labels)

Output:

preds:
tensor([0, 0, 0, 0, 0])
labels:
tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
         0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
         0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
         0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.,
         0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
         0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
         0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.,
         0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
         0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
         0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]])
bidon
  • 27
  • 4

1 Answers1

1

You haven't shown how you defined your dataloader, but assuming you are wrapping load_dataset with a torch.utils.data.DataLoader and setting batch_size=5.

If you set your batch size to 5, then you will have 5 "problems" and the corresponding 5 "solutions" in a single batch. Each having 100 components. This means inputs and labels will be two tensors shaped as (batch_size=5, 100).

Ivan
  • 34,531
  • 8
  • 55
  • 100
  • thank you @Ivan first, your assumptions are true. however, isn't the batching is all about in the first place? I thought I would split the 1 instance of data even its pieces. I have the same output dimension with the input (its 100). – bidon Oct 22 '21 at 09:53
  • Maybe I'm missing something but you explained that a *problem* could be described by 100 components, to me, this 100-feature vector corresponds to your model input, correct? Or does one problem correspond to multiple (100 different) outputs? – Ivan Oct 22 '21 at 10:01