How to use torch.nn.CrossEntropyLoss as autoencoder's reconstruction loss?

Question

I want to compute the reconstruction accuracy of my autoencoder using CrossEntropyLoss:

ae_criterion = nn.CrossEntropyLoss()
ae_loss = ae_criterion(X, Y)

where X is the autoencoder's reconstruction and Y is the target (since it is an autoencoder, Y is the same as the original input X). Both X and Y have shape [42, 32, 130] = [batch_size, timesteps, number_of_classes]. When I run the code above I get the following error:

ValueError: Expected target size (42, 130), got torch.Size([42, 32, 130])

After looking the docs, I'm still unsure on how should I call nn.CrossEntropyLoss() in the appropriate way. It seems that I should change Y to be of shape [42, 32, 1], with each element being a scalar in the interval [0, 129] (or [1, 130]), am I right?

Is there a way to avoid this? Since X and Y are between 0 and 1, could I just use binary cross-entropy loss element-wise in an equivalent way?

`CrossEntropyLoss` is commonly used for classificaion problems. You should probably use an `MSELoss` or similar. — iacolippo, Apr 12 '19 at 16:05
@iacolippo no, my dataset is composed of time series of discrete events, so I'm in fact doing classification. — miditower, Apr 12 '19 at 16:54
Oh, my bad, misread the question. Is there a reason why you don't want to take the argmax over the last dimension? (Have idx of the class instead of one-hot vectors). Otherwise you can use `BCELoss` as suggested in the answer. — iacolippo, Apr 13 '19 at 17:07
Mainly it's just of computation efficiency, I wouldn't want to do the computation at every training iteration. But it seems the only way here, since my third dimension is the output of a softmax layer I don't think `BCELoss` is appropriate here. — miditower, Apr 15 '19 at 08:16
if you look at how much time it takes to perform the argmax, I'm pretty sure it's negligible compared to all the rest :-) — iacolippo, Apr 15 '19 at 08:40

score 1 · Answer 1 · answered Apr 12 '19 at 19:48

1

For CrossEntropyLoss, shape of the Y must be (42, 32), each element must be a Long scalar in the interval [0, 129].

You may want to use BCELoss or BCEWithLogitsLoss for your problem.

answered Apr 12 '19 at 19:48

Sergii Dymchenko

6,890
1
21
46

Are you sure that using `BCELoss` will be equivalent to `CrossEntropyLoss` in this case? My last dimension is the output of a softmax layer over 130 classes, not sure it's the same as using an element-wise `BCELoss`. – miditower Apr 15 '19 at 08:12

How to use torch.nn.CrossEntropyLoss as autoencoder's reconstruction loss?

1 Answers1