I am working on a Keras model with a custom loss function provided by a Mixture Density Network final layer (the loss tries to minimize the negative log likelihood of some Gaussian models).
What confuses me is the loss will sometimes hit an epoch in which it returns -inf as the resulting loss. Then the next iteration the loss will be a number again (e.g. -2.1). The loss sometimes bounces between negative infinity and a number every other epoch.
The negative loss is evidently to be expected with a NLL loss, but this fluctuation is confusing to me. What explains this behavior within Keras? My understanding is the -inf loss is caused by numeric underflow somewhere, but I'm not sure how the model can recover from this and re-establish numeric stability thereafter.
Does anyone know how this works? I'd be very grateful for any suggestions others can offer on this question.