I implemented an LSTM with attention in Keras to reproduce this paper. The strange behavior is simple: I have an MSE loss function and an MAPE and MAE as metrics. During training the MAPE is exploding but the MSE and MAE seem to train normally:
Epoch 1/20
275/275 [==============================] - 191s 693ms/step - loss: 0.1005 - mape: 15794.8682 - mae: 0.2382 - val_loss: 0.0334 - val_mape: 24.9470 - val_mae: 0.1607
Epoch 2/20
275/275 [==============================] - 184s 669ms/step - loss: 0.0099 - mape: 6385.5464 - mae: 0.0725 - val_loss: 0.0078 - val_mape: 11.3268 - val_mae: 0.0803
Epoch 3/20
275/275 [==============================] - 186s 676ms/step - loss: 0.0025 - mape: 5909.3735 - mae: 0.0369 - val_loss: 0.0131 - val_mape: 14.9827 - val_mae: 0.1061
Epoch 4/20
275/275 [==============================] - 187s 678ms/step - loss: 0.0015 - mape: 4746.2788 - mae: 0.0278 - val_loss: 0.0142 - val_mape: 16.1894 - val_mae: 0.1122
Epoch 5/20
30/275 [==>...........................] - ETA: 2:38 - loss: 0.0012 - mape: 9.3647 - mae: 0.0246
The MAPE is exploding at the end of each epoch. What could be the cause of this specific behavior?
The MAPE is still decreasing with each epoch so is this not really an issue since it is not hindering the training process?