Correct use of Cross-entropy as a loss function for sequence of elements

Question

I have a sequece labeling task.

So as input, I have a sequence of elements with shape [batch_size, sequence_length] and where each element of this sequence should be assigned with some class.

And as a loss function during training a neural net, I use a Cross-entropy.

How should I correctly use it? My variable target_predictions has shape [batch_size, sequence_length, number_of_classes] and target has shape [batch_size, sequence_length].

Documentation says:

I know if I use CrossEntropyLoss(target_predictions.permute(0, 2, 1), target), everything will work fine. But I have concerns that torch is intepreting my sequence_length as d_1 variable as on screenshot and will think that it is a multidimential loss, which is not the case.

How should I correctly do it?

score 1 · Answer 1 · answered Sep 28 '21 at 20:00

1

Using CE Loss will give you loss instead of labels. By default mean will be taken which is what you are probably after and the snippet with permute will be fine (using this loss you can train your nn via backward).

To get predicted class just take argmax across appropriate dimension, in the case without permutation it would be:

labels = torch.argmax(target_predictions, dim=-1)

This will give you (batch, sequence_length) output containing classes.

answered Sep 28 '21 at 20:00

Szymon Maszke

22,747
4
43
83

yes, I use CE for loss. You think that permutation of axes is enough and pytorch will not confuse with `d1` variable because there is no multidimential loss? – Kenenbek Arzymatov Sep 28 '21 at 20:17
It won’t, works fine for multidimensional cases (e.g. segmentation) as long as you stick to appropriate dimensions. – Szymon Maszke Sep 28 '21 at 20:24

Correct use of Cross-entropy as a loss function for sequence of elements

1 Answers1