5

I'm training a network which has multiple losses and both creating and feeding the data into my network using a generator.

I've checked the structure of the data and it looks fine generally and it also trains pretty much as expected the majority of the time, however at a random epoch almost every time, the the training loss for every prediction suddenly jumps from say

# End of epoch 3
loss: 2.8845 

to

# Beginning of epoch 4
loss: 1.1921e-07

I thought it could be the data, however, from what I can tell the data is generally fine and it's even more suspicious because this will happen at a random epoch (could be because of a random data point chosen during SGD?) but will persist throughout the rest of training. As in if at epoch 3, the training loss decreases to 1.1921e-07 then it will continue this way in epoch 4, epoch 5, etc.

However, there are times when it reaches epoch 5 and hasn't done this yet and then might do it at epoch 6 or 7.

Is there any viable reason outside of the data that could cause this? Could it even happen that a few fudgy data points causes this so fast?

Thanks

EDIT:

Results:

300/300 [==============================] - 339s - loss: 3.2912 - loss_1: 1.8683 - loss_2: 9.1352 - loss_3: 5.9845 - 
val_loss: 1.1921e-07 - val_loss_1: 1.1921e-07 - val_loss_2: 1.1921e-07 - val_loss_3: 1.1921e-07

The next epochs after this all have trainig loss 1.1921e-07

tryingtolearn
  • 2,528
  • 7
  • 26
  • 45
  • 1
    Keras displays the loss as an average of the current epoch. It means that if the optimizer finds a "cliff" in the loss landscape, it won't be evident until the beginning of the next epoch. It could be also that at some point, your model becomes very good predicting the first batches of an epoch. You could try to shuffle the data every epoch. Without viewing some code, difficult to say if it is right or wrong. In any case I would encourage you to use a validation set, if you are not using it yet. – Manolo Santos Jul 26 '17 at 09:35
  • @ManoloSantos I see, thanks for that insight. So you're saying that it's possible that if it just keeps finding "bad data" e.g. empty data (which I feel like it would have to be in order for it to produce zero loss across all predictions) then the loss could just shoot off. I'm currently generating the data randomly from the dataset so the order of prediction shouldn't be a problem? I will do further tests and let you know. – tryingtolearn Jul 26 '17 at 09:54
  • A loss of near 0, doesn't mean that you have "bad data", it means that the model is very confident predicting your data. (It could mean that it is overfitting and memorizing it, that's why I recommend you to use a validation set, to discard this possibility). – Manolo Santos Jul 26 '17 at 10:06
  • @ManoloSantos Ah yes of course, I simply said "bad data" because from experience on this particular project a loss of zero would mean a rather incredible discovery... – tryingtolearn Jul 26 '17 at 10:12
  • @ManoloSantos Hi Manalos. I've used a validation set to check and the results are in the edit. Weirdly, it's predicted to be at zero before the training is even at zero so that seems very weird. Also, after this epoch, all training losses suddenly shot to the same error as the validation set.. – tryingtolearn Jul 26 '17 at 11:40
  • Mmmmh. It's strange, indeed. In order to help you, you should add more detail to you answer. Code, NN topology, loss function, dataset, etc. – Manolo Santos Jul 26 '17 at 12:52
  • @Would it be possible to move this into a private chat as there is a lot of code to go with? When an answer is found I'm more than happy to upload a detailed response. – tryingtolearn Jul 26 '17 at 14:34

1 Answers1

1

Not entirely sure how satisfactory this is as an answer but my findings seem to show that using multiple categorical_crossentropy loss's together seems to result in a super unstable network? Swapping this out for other loss functions fixes the problem with the data remaining unchanged.

tryingtolearn
  • 2,528
  • 7
  • 26
  • 45
  • In hindsight, this could've been due to the values of the likelihood being so small or large that there could've been nans so clipping the values more suitably may have helped. – tryingtolearn May 02 '18 at 10:26