2

I know theres no need to use a nn.Softmax() Function in the output layer for a neural net when using nn.CrossEntropyLoss as a loss function.

However I need to do so, is there a way to suppress the implemented use of softmax in nn.CrossEntropyLoss and instead use nn.Softmax() on my output layer of the neural network itself?

Motivation: I am using shap package to analyze the features influences afterwards, where I can only feed my trained model as an input. The outputs however don't make any sense then because I am looking at unbound values instead of probabilites.

Example: Instead of -69.36 as an output value for one class of my model, I want something between 0 and 1, summing up to 1 for all classes. As I can't alter it afterwards, the outputs need to be like this already during training.

enter image description here

Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161
Quastiat
  • 1,164
  • 1
  • 18
  • 37

2 Answers2

5

You can use nn.NLLLoss(). nn.CrossEntropyLoss computes log softmax of the input scores and computes the negative log-likelihood loss. If you already have log probabilities, you can just use nn.NLLLoss().

Here is an example from PyTorchs documentation

m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.tensor([1, 0, 4])
output = loss(m(input), target)
Nebiyou Yismaw
  • 740
  • 4
  • 12
  • thanks for the input, I am kind of looking for the same approach but using `nn.Softmax` instead of `nn.LogSoftmax`. I want the output to be bound to [0,1], but LogSoftmax unfortunately isn't. – Quastiat Sep 26 '19 at 19:38
5

The documentation of nn.CrossEntropyLoss says,

This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.

I suggest you stick to the use of CrossEntropyLoss as the loss criterion. However, you can convert the output of your model into probability values by using the softmax function.

Please note, you can always play with the output values of your model, you do not need to change the loss criterion for that.

But if you still want to use Softmax() in your network, then you can use the NLLLoss() as the loss criterion, only apply log() before feeding model's output to the criterion function. Similarly, if you use LogSoftmax instead in your network, you can apply exp() to get the probability values.

Update:

To use log() on the Softmax output, please do:

torch.log(prob_scores + 1e-20)

By adding a very small number (1e-20) to the prob_scores, we can avoid the log(0) issue.

Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161
  • Thanks for the clarification, i mixed up Softmax and Logsoftmax. Your last comment is actually an as obvious as genius idea, i think thats exactly what I was looking for. Thank you! – Quastiat Sep 26 '19 at 19:49
  • Edit: what i didnt think of: applying log results in a lot of nan values due to -inf from log(0).. – Quastiat Sep 26 '19 at 19:55
  • Yeah that solves the issue, probably was only happing due to a very rare rounding issue. That works as expected thanks a lot again! – Quastiat Sep 26 '19 at 20:09