Pytorch Softmax giving nans and negative values as output

Question

I am using softmax at the end of my model.

However after some training softmax is giving negative probability.In some situations I have encountered nans as probability as well.

one solution i found on searching is to use normalized softmax…however I can not find any pytorch imlpementaion for this.

Can someone please help to let know if there is a normalized softmax available or how to achieve this so that forward and backward propagations are smooth.

Please note that I am already using torch.nn.utils.clip_grad_norm_(model.parameters(), 40) to avoid exploding gradients

I am using pytorch 1.6.0

Are you by any chance using the log_softmax? "Normalized softmax" doesn't make much sense, as SoftMax itself already provides a form of normalization. If you get NaN values this is probably caused at an earlier stage in your network, using a debugger in an IDE might help in that case. — jumelet, Sep 09 '20 at 11:59
Hi, yes I am using log_softmax and also softmax..I am implementing A3C...I am now trying to take sigmoid before taking softmax and will see the results — Granth, Sep 09 '20 at 12:46
@user2783767 what implementation of `softmax` are you using? are you using pytorch's implementation or your own? — Shai, Sep 09 '20 at 12:51

score 4 · Accepted Answer · answered Sep 09 '20 at 14:41

Softmax will always return positive results, but it will keep track of other results:

m = nn.Softmax(dim=1)
input = torch.randn(2, 3)
print(input)
output = m(input)
output

Out:

tensor([[ 0.0983,  0.4150, -1.1342],
        [ 0.3411,  0.5553,  0.0182]])

tensor([[0.3754, 0.5152, 0.1094],
        [0.3375, 0.4181, 0.2444]])

You are tracking the rows. Note how for

0.0983, 0.4150, -1.1342 You will get 0.3411, 0.5553, 0.0182

Saying that 0.4150 is the biggest value.

The hard max (as we know this is max()) will just return the maximum value.

So, if you have negative results for softmax this is not possible, you may have hit some implementation failure.

Pytorch Softmax giving nans and negative values as output

1 Answers1