I'm training LSTM model for time series forecasting. This is the train loss plot.
This is a one-step-ahead forecasting case, so I'm training the model using a rolling window. Here, we have 26 steps of forecasting (for every step, I train the model again). As you can see, after Epoch #25~27, the training loss suddenly will be so noisily. Why we have this behaviour?
Ps. I'm using LSTM with tanh
activation. Also, I used L1
and L2
regularization, but the behaviour is the same. The layer after LSTM
is a Dense
layer with linear
activation, I MinMaxScaler
is applied on input data and the optimizer is Adam
. I also see the same behaviour in validation dataset.