0

I want to compute the reconstruction accuracy of my autoencoder using CrossEntropyLoss:

ae_criterion = nn.CrossEntropyLoss()
ae_loss = ae_criterion(X, Y)

where X is the autoencoder's reconstruction and Y is the target (since it is an autoencoder, Y is the same as the original input X). Both X and Y have shape [42, 32, 130] = [batch_size, timesteps, number_of_classes]. When I run the code above I get the following error:

ValueError: Expected target size (42, 130), got torch.Size([42, 32, 130])

After looking the docs, I'm still unsure on how should I call nn.CrossEntropyLoss() in the appropriate way. It seems that I should change Y to be of shape [42, 32, 1], with each element being a scalar in the interval [0, 129] (or [1, 130]), am I right?

Is there a way to avoid this? Since X and Y are between 0 and 1, could I just use binary cross-entropy loss element-wise in an equivalent way?

Stack Danny
  • 7,754
  • 2
  • 26
  • 55
miditower
  • 107
  • 2
  • 9
  • `CrossEntropyLoss` is commonly used for classificaion problems. You should probably use an `MSELoss` or similar. – iacolippo Apr 12 '19 at 16:05
  • 1
    @iacolippo no, my dataset is composed of time series of discrete events, so I'm in fact doing classification. – miditower Apr 12 '19 at 16:54
  • 1
    Oh, my bad, misread the question. Is there a reason why you don't want to take the argmax over the last dimension? (Have idx of the class instead of one-hot vectors). Otherwise you can use `BCELoss` as suggested in the answer. – iacolippo Apr 13 '19 at 17:07
  • Mainly it's just of computation efficiency, I wouldn't want to do the computation at every training iteration. But it seems the only way here, since my third dimension is the output of a softmax layer I don't think `BCELoss` is appropriate here. – miditower Apr 15 '19 at 08:16
  • 1
    if you look at how much time it takes to perform the argmax, I'm pretty sure it's negligible compared to all the rest :-) – iacolippo Apr 15 '19 at 08:40

1 Answers1

1

For CrossEntropyLoss, shape of the Y must be (42, 32), each element must be a Long scalar in the interval [0, 129].

You may want to use BCELoss or BCEWithLogitsLoss for your problem.

Sergii Dymchenko
  • 6,890
  • 1
  • 21
  • 46
  • Are you sure that using `BCELoss` will be equivalent to `CrossEntropyLoss` in this case? My last dimension is the output of a softmax layer over 130 classes, not sure it's the same as using an element-wise `BCELoss`. – miditower Apr 15 '19 at 08:12