I have read some of the other topics that are about this issue, but I don't understand how to solve it in my case as the loss functions they use are usually a lot more complex than mine. I believe the answer is about adding a small value like 1e-10
somewhere in the loss function, but I don't know where. My data has 6 time series that I transformed to log return. I set up an encoder-decoder model that takes in batches of size 60 and predictions of the following 30 values for all 6 time series.
I have tried adding 1e-10
to various parts of the loss function but when I run the model it still has loss of Nan. Also, in case it matters, I plan to use MinMaxScaler to transform the data in my next attempt rather than using log returns. Not sure if this is important to mention, but just in case it changes how to fix this problem.
I have tried changing various other aspects as well: I used the original non-transformed data, tried tanh activations for the LSTM layers and different optimisers as well, no matter what it still comes out to Nan for loss.
If I change loss to 'mse'
then it does seem to work and the loss is around 1.5e-6
for both training and validation.
Any ideas? Thanks!
Here is the transformation I applied to the data just in case:
data_lr = pd.DataFrame()
for column in data.columns:
data_lr['log_return_'+column] = np.log(data[column]/data[column].shift(1))
data_lr.dropna(inplace=True)
data_lr.head()
And here is the first rows of the data to show what the values are like:
log_return_sample_0 log_return_sample_1 log_return_sample_2 log_return_sample_3 log_return_sample_4 log_return_sample_5
day.minute
0.1 -0.001386 -0.001578 -0.001115 -0.000758 -0.000910 0.000223
0.2 0.001386 0.000226 -0.002514 -0.002847 -0.003647 0.000669
0.3 0.000346 0.001353 -0.000839 -0.000951 -0.001600 0.001336
0.4 0.000692 0.000676 0.000839 0.000380 0.000457 0.000667
0.5 0.000000 -0.000450 0.000000 0.000570 0.000914 0.002443
Here is the loss function:
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import RepeatVector
from keras.layers import TimeDistributed
import tensorflow.keras.backend as kb
def custom_loss_function(y_actual, y_hat):
custom_loss_value = kb.mean(kb.abs(kb.log(y_hat/y_actual)))
return custom_loss_value
Here is the model:
model_LSTM = Sequential()
model_LSTM.add(LSTM(200, activation='relu', input_shape=(n_steps_in, num_features)))
model_LSTM.add(RepeatVector(n_steps_out))
model_LSTM.add(LSTM(200, activation='relu', return_sequences=True))
model_LSTM.add(TimeDistributed(Dense(num_features)))
model_LSTM.compile(optimizer='adam', loss=custom_loss_function)
Here's the model summary:
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_3 (LSTM) (None, 200) 165600
_________________________________________________________________
repeat_vector_2 (RepeatVecto (None, 30, 200) 0
_________________________________________________________________
lstm_4 (LSTM) (None, 30, 200) 320800
_________________________________________________________________
time_distributed_2 (TimeDist (None, 30, 6) 1206
=================================================================
Total params: 487,606
Trainable params: 487,606
Non-trainable params: 0
_________________________________________________________________
None
And finally what I ran:
model_LSTM.fit(X, y, epochs=10, verbose=2, validation_split=0.1)
EDIT: Just in case, here are the original values untransformed for the first few rows:
sample_0 sample_1 sample_2 sample_3 sample_4 sample_5
day.minute
0.0 28.88 44.39 35.89 52.81 43.99 44.84
0.1 28.84 44.32 35.85 52.77 43.95 44.85
0.2 28.88 44.33 35.76 52.62 43.79 44.88
0.3 28.89 44.39 35.73 52.57 43.72 44.94
0.4 28.91 44.42 35.76 52.59 43.74 44.97