Calculating accuracy on padded tensors

Question

Greeting,

I am currently trying to develop a scene text-recognition model with a CNN (which might change to a pre-trained ResNet in the future) backbone and a LSTM for the predictions.

Skipping all the unnecessary info, my targets are in shape [16, 30] which represents the max length of a sequence and contains the labels (0-9 and a-z) represented in the interval of [1-36] with 37 being reserved for the END-TOKEN which I inserted on the end of each sequence right before the padding starts. The rest of the target tensor which does not contain any characters is padded up to max_length (30) with 38s. Preds are the indices of the max and the result of the operation:

_, preds = torch.max(outputs, 2)

Where outputs are the LSTM outputs in the shape of [16, 30, 38]. I am using CTCLoss so it automaticaly ignores the padded part of the sequence. The question is though how I can do the same with accuracy?

Additional info

batch_size = 16
max_seq_len = 30
vocabilary_size = 38 (with END-TOKEN and CTCLoss 'blank' character)

What I have tried is create an ignore-mask as have seen here (changed appropriately for it to work with Pytorch)

import torch

class alt_accuracy:

    def __init__(self, pad_token = 38):
        self.pad_token = pad_token

    def ignore_pad_accuracy(self, preds, targets):
    
        preds_class = preds
        targets_class = targets 
    
        ignore_mask = (~torch.eq(preds_class, self.pad_token)).type(torch.IntTensor)
        matches = (torch.eq(targets_class, preds_class)).type(torch.IntTensor) * ignore_mask
        if  torch.sum(matches) == 0:
            ignore_accuracy = torch.tensor(0)
        else:
            ignore_accuracy = torch.sum(matches) / torch.sum(ignore_mask)
        return ignore_accuracy

The problem is this mask (if i understand it correctly) expects from the model to predict the padds which mine cannot thanks to using CTCLoss and its innate characteristic of not wanting the blank labels on the targets as seen here so the ignore_mask from start to finish always has a tensor filled with 1s which indicates it never finds any padding

I would appreciate any help with the problem.

You should derive your `ignore_mask` from the targets, not the predictions. Your targets already indicate which elements of the sequence should be ignored (pad). — DerekG, Nov 30 '22 at 14:37

Calculating accuracy on padded tensors

0 Answers0