I'm using an Inception-V4 model to do a multi-label classification with tensorflow. The model has an output dimension of 51 and my labels are either one or zero for these 51 classes. I did not modify the Inception-V4 any further.
In order to learn the multilabel classification, I use a sigmoid cross entropy loss by applying tf.losses.sigmoid_cross_entropy. My logits are unscaled (no activation function) and my labels also look fine. The overall loss is computed via tf.losses.get_total_loss().
Now the whole system works and learns for many epochs without a problem. However, after about 15 to 20 epochs of learning, it sometimes produces negative loss values. At this time the loss is around 0.01 but sometimes drops down to -0.15 or even lower (I once saw -1). Furthermore, my model's accuracy also drops after these steps, so I think it's actually hurting it.
Does anyone have an idea what I could have done wrong? Has anyone experienced this before? What could I do?
I didn't include my code yet, because I think its just a simple Estimator pipeline and I directly take the logits and auxLogits of the inception_v4 and attach such a sigmoid cross entropy loss to both of them. But if the code would help, I can try shrinking it to add it here.
This is what the loss of the logits looks like: