0

I was working on this https://www.kaggle.com/gti-upm/leapgestrecog data set lately. Its a hand gesture dataset and I was trying to make a classifier. Due to images available in different types of folder I made my on data loader. Here it is

class DatasetLoader(Dataset):

  def __init__(self,path):
    self.path_list = path
    self.labels = []
    self.to_tensor = transforms.ToTensor()
    self.resize = transforms.Resize((120,320))
    self.gray = transforms.Grayscale(num_output_channels = 1)
    self._init_dataset()

 def _init_dataset(self):
    labels = set()
    for diro in os.listdir("/kaggle/input/leapgestrecog/leapGestRecog"):
      for d in os.listdir(os.path.join("/kaggle/input/leapgestrecog/leapGestRecog",diro)):
        if len(d.split('_'))>2:
          labels.add("_".join(d.split("_")[-2:]))
        else:
          labels.add(d.split("_")[-1])
    labels = list(labels) 
    ## help me on this line with some codes

  def __getitem__(self,idx): 
    if torch.is_tensor(idx): 
      idx = idx.tolist() 
    img_name = self.path_list[idx] 
    img = Image.open(img_name) 
    img = self.resize(img) 
    img = self.gray(img) 
    img = self.to_tensor(img) 
    if len(img_name.split('/')[-2].split('_')) > 2: 
      label = "_".join(img_name.split('/')[-2].split('_')[-2:]) 
    else: 
      label = img_name.split('/')[-2].split('_')[-1] 
    label = ## Here also 
    return img,label

  def __len__(self):
    return len(self.path_list)

I have problem with label which I am getting from this dataset loader. As I have created a model which takes n batches of data with 10 classes so during loss calculation I need my labels to of size(n,10). I dont know what to do. Here is my network design:

class Net(nn.Module):
    def __init__(self):

        super(Net,self).__init__()
        self.conv1 = nn.Conv2d(1,32,5)
        self.pool = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(32,64,3)
        self.conv3 = nn.Conv2d(64,64,3)
        self.fc1 = nn.Linear(64*38*13,128)
        self.fc2 = nn.Linear(128,10)

    def forward(self,x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(64,64*38*13)
        x = F.relu(self.fc1(x))

        return F.log_softmax(self.fc2(x),dim = 1)

If y is label of an image. To train our network we feed loss function with y and output. But output we get is of size (64,10) , so I need help with the label in dataloader

1 Answers1

0

I see that you have a bit of misunderstanding about the input dimensions to a multi-class loss function in PyTorch. The most commonly used loss function for classification problems is nn.CrossEntropyLoss(), which expects raw logits of size (n, c) (e.g. (64, 10)) as input1, and target (ground truth label) of size (n) (e.g. (10)).

So instead of doing return F.log_softmax(self.fc2(x),dim = 1), it would be more stable to directly do return x and use CrossEntropyLoss. There is no need to reshape your labels, you can directly calculate the loss by doing something like this:

criterion = nn.CrossEntropyLoss()
# Let x be (64, 10) output from model
# Let y be (10,) label
loss = criterion(x, y)
ccl
  • 2,378
  • 2
  • 12
  • 26