The issue seems to be that when the input to your sigmoid implementation is negative, the argument to torch.exp
becomes very large, causing an overflow. Using torch.autograd.set_detect_anomaly(True)
as suggested here, you can see the error:
RuntimeError: Function 'ExpBackward' returned nan values in its 0th output.
If you really need to use the function you have defined, a possible workaround could be to put a conditional check on the argument (but I am not sure if it would be stable, so I cannot comment on its usefulness):
def sigmoid(x):
if x >= 0:
return 1./(1+torch.exp(-1e5*x))
else:
return torch.exp(1e5*x)/(1+torch.exp(1e5*x))
Here, the expression in the else branch is equivalent to the original function, by multiplying the numerator and denominator by torch.exp(1e5*x)
. This ensures that the argument to torch.exp
is always negative or close to zero.
As noted by trialNerror, the exponent value is so high that except for values extremely close to zero, your gradient will evaluate to zero everywhere else since the actual slope will be extremely small and cannot be resolved by the data type. So if you plan to use it in a network you will likely find it very difficult to learn anything since gradients will almost always be zero. It might be better to select a smaller exponent, depending on your use case.