Keras GRU model predicts only [-0., -0., -0., -0., -0.]

Question

I'm trying to predict 5 periodic prices of cryptocurrency based on previous 50 inputs.

>>> X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
((291314, 50, 8), (72829, 50, 8), (291314, 5), (72829, 5))

Here I have 50 previous samples x 8 features as input sample and prices for 5 next periods as outputs

I've build model with this code:

from tensorflow.keras.layers import GRU
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation

model = Sequential()
model.add(GRU(units=50, input_shape=X_train.shape[1:], return_sequences=False))
model.add(Activation('tanh'))
model.add(Dropout(0.2))
model.add(Dense(NFS))
model.add(Activation('relu'))
model.compile(loss='mse', optimizer='adam')
model.fit(X_train, Y_train, batch_size=50, validation_data=(X_test, Y_test), epochs=2)

That gave me output:

Train on 291314 samples, validate on 72829 samples
Epoch 1/2
291314/291314 [==============================] - 487s 2ms/step - loss: 0.0107 - val_loss: 0.2502
Epoch 2/2
291314/291314 [==============================] - 463 2ms/step - loss: 0.0103 - val_loss: 0.2502

After this step I've tried to predict outputs for X_test but instead of prediction I've got matrix with correct shape but full of zeros instead of any predictions:

>>> model.predict(X_test)
array([[-0., -0., -0., -0., -0.],
       [-0., -0., -0., -0., -0.],
       [-0., -0., -0., -0., -0.],
       ...,
       [-0., -0., -0., -0., -0.],
       [-0., -0., -0., -0., -0.],
       [-0., -0., -0., -0., -0.]], dtype=float32)

Why I'm getting this bad? And do I use correct way to do what I want?

UPD: Here is the full notebook.

Whoops. Based on the title, I'm guessing the debt collectors are on their way. — Mad Physicist, Aug 22 '18 at 16:22
Also, have you tried changing the optimizer or its parameters (e.g. learning rate) or increase the number of epochs (of course, assuming you have normalized your data properly)? — today, Aug 22 '18 at 16:30
By the way, if our comments helped you to train a model which predicts the prices accurately could you please share it with us as a thank you gift? :)) I'm just kidding! — today, Aug 22 '18 at 16:33
Added link to full notebook to the question's bottom. I've used sklearn.MinMaxScaler. And I have no idea which parameter I should adjust. I'm just following this tutorial: https://medium.com/@huangkh19951228/predicting-cryptocurrency-price-with-tensorflow-and-keras-e1674b0dc58a — Vassily, Aug 22 '18 at 16:33
Are you sure it's _full_ of zeros? The output shows that part of the matrix was not shown, so maybe there are some non-zero values? — ForceBru, Aug 22 '18 at 16:35
I'm lied It was 5 epochs before but 3rd, 4th and 5th has not changed losses and I've reduced epochs number to 2 — Vassily, Aug 22 '18 at 16:40
Terry, I'm not sure. I'm relaunched notebook and going to give you an precise answer about existence of non-zeros in the prediction in 10 minutes — Vassily, Aug 22 '18 at 16:42
Now I'm sure that there are all zeros in the prediction. File with data added to repository — Vassily, Aug 22 '18 at 17:30

MBT · Answer 1 · 2018-08-30T14:52:25.560

First you need to scale your test (X_test) input. You did indeed scale your training data (X_train), but not the test set.

So you need to scale it like you did with X_train:

X_test = preprocessing.MinMaxScaler().fit_transform(X_test.reshape(-1, 50*8)).reshape(-1, 50, 8)

Further the use of 'ReLU' activation in the output layer is problematic. Because even if the last layers weights yield to a negative output you will always get a positive output.

The problem here is that these weights for the negative output won't get updated that much as the loss is very low.

Imagine your weights set leads to an output of -23435235, while your target is 0.9. When using 'ReLU' activation on your output it is mapped from -23435235 to 0 which results in low loss. But low loss means less change while a high loss on the contrary leads to much change in your weights.

So you want a high loss in order to get a strong correction of your weights. Because -23435235 is not what you want.

So don't use 'ReLU' in the last layer, I changed to 'linear' here.

So that said (I changed 'tanh' to 'ReLU' btw.) code:

#somewhere before you need to normalize your `X_test`
X_test = preprocessing.MinMaxScaler().fit_transform(X_test.reshape(-1, 50*8)).reshape(-1, 50, 8)


from tensorflow.keras.layers import GRU
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation

model = Sequential()
model.add(GRU(units=50, input_shape=X_train.shape[1:], return_sequences=False))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(NFS))
model.add(Activation('linear'))
model.compile(loss='mse', optimizer='adam')
model.fit(X_train, Y_train, batch_size=4000, validation_data=(X_test, Y_test), epochs=15)

Output:

Train on 291314 samples, validate on 72829 samples
Epoch 1/15
291314/291314 [==============================] - 22s 75us/step - loss: 0.1523 - val_loss: 0.2442
Epoch 2/15
291314/291314 [==============================] - 16s 56us/step - loss: 0.0652 - val_loss: 0.2375
Epoch 3/15
291314/291314 [==============================] - 16s 56us/step - loss: 0.0420 - val_loss: 0.2316
Epoch 4/15
291314/291314 [==============================] - 16s 56us/step - loss: 0.0337 - val_loss: 0.2262
Epoch 5/15
291314/291314 [==============================] - 16s 56us/step - loss: 0.0271 - val_loss: 0.2272
Epoch 6/15
291314/291314 [==============================] - 16s 56us/step - loss: 0.0219 - val_loss: 0.2256
Epoch 7/15
291314/291314 [==============================] - 16s 56us/step - loss: 0.0179 - val_loss: 0.2245
Epoch 8/15
291314/291314 [==============================] - 16s 56us/step - loss: 0.0149 - val_loss: 0.2246
Epoch 9/15
291314/291314 [==============================] - 16s 56us/step - loss: 0.0125 - val_loss: 0.2244
Epoch 10/15
291314/291314 [==============================] - 16s 57us/step - loss: 0.0108 - val_loss: 0.2213
Epoch 11/15
291314/291314 [==============================] - 16s 57us/step - loss: 0.0096 - val_loss: 0.2197
Epoch 12/15
291314/291314 [==============================] - 16s 56us/step - loss: 0.0087 - val_loss: 0.2189
Epoch 13/15
291314/291314 [==============================] - 16s 57us/step - loss: 0.0080 - val_loss: 0.2178
Epoch 14/15
291314/291314 [==============================] - 16s 56us/step - loss: 0.0075 - val_loss: 0.2148
Epoch 15/15
291314/291314 [==============================] - 16s 57us/step - loss: 0.0072 - val_loss: 0.2129
<tensorflow.python.keras.callbacks.History at 0x7f8a93637b70>

Further the X_test results:

Code:

prediction = model.predict(X_test[:10])
prediction

Output:

array([[0.03562379, 0.06016447, 0.0987532 , 0.01986726, 0.0336756 ],
       [0.03518523, 0.06041833, 0.0983481 , 0.01864071, 0.03437094],
       [0.03487844, 0.06067847, 0.09811568, 0.0175517 , 0.03480709],
       [0.03491565, 0.05986937, 0.09927133, 0.02029082, 0.03347992],
       [0.03466946, 0.06018706, 0.09859383, 0.01869587, 0.03432   ],
       [0.03459518, 0.06030918, 0.09850594, 0.01805007, 0.03444977],
       [0.03448001, 0.06019764, 0.09864715, 0.01818896, 0.034256  ],
       [0.03450274, 0.05936757, 0.10001318, 0.02131432, 0.03305689],
       [0.03424717, 0.05954869, 0.09983289, 0.0208826 , 0.03378636],
       [0.03426195, 0.05959999, 0.09991242, 0.02090426, 0.03394405]],
      dtype=float32)

I used your notebook and data to train the model as described above.

As you can see validation loss is still decreasing in epoch 15 and also the test output looks now quite close to the target.

One more note - I haven't gone through all the preprocessing code in the notebook, but it seems to me you are using absolute values.

If this is the case you should consider using percentage changes instead (e.g. from current time point to the predicted points in the future). This also does the scaling for you. (10% change = 0.1)

Further absolute values do change too much. If the price was ~5.4324 ten month ago and today the price is ~50.5534 than these data are useless for you, while relative patterns of the price change may still be valid.

This just as a side note - I hope it helps.

@VassiliyVorobyov And another side note: if you are running the training on a GPU then you can consider using [`CuDNNGRU`](https://keras.io/layers/recurrent/#cudnngru) instead of `GRU` (or `CuDNNLSTM` instead of `LSTM`) since it is specifically optimized for a GPU and speeds up the training process. — today, Aug 30 '18 at 17:10

today · Accepted Answer · 2018-08-30T17:43:14.760

Well, I think the normalization scheme suggested in the @blue-phoenox's answer is flawed. That's because you should NEVER EVER normalize the test data independently (i.e. with different statistics). Rather, you should use the statistics computed during the normalization of training data to normalize the test data. So it must be like this:

mms = preprocessing.MinMaxScaler()
X_train = mms.fit_transform(X_train)
X_test = mms.transform(X_test) # you should not use fit_transform

This makes sense since consider the following scenario that you have trained your model and now deployed it into production for real use. Now a user feeds it with one new sample. You need to first normalize this new sample, but how? You can't scale its values independently since it is only one sample (i.e. all of them would be one or zero if you use min-max scaler). Rather, you would use (in case of using min-max scaler) the "min" and "max" values computed over the training data to normalize this new test data.

This is very common in image models, like this:

X_train /= 255.
X_test /= 255.

Note that we divide both training and test data by the same number (i.e. 255). Or a more sophisticated normalization scheme:

X_mean = X_train.mean(axis=0)
X_std = X_train.std(axis=0)
X_train -= X_mean
X_train /= X_std + 1e-8   # add a small constant to prevent division by zero

# Now to normalize test data we use the same X_mean and X_std already computed
X_test -= X_mean
X_test /= X_std + 1e-8

Side note (as I mentioned in my comment): if you are running the training on a GPU then you can consider using CuDNNGRU instead of GRU (or CuDNNLSTM instead of LSTM) since it is specifically optimized for a GPU and speeds up the training process.

score 0 · Answer 3 · answered Jun 10 '20 at 17:16

For anyone who has the same output error and searched for this even now, to expand a bit on the excellent answer provided by @MBT you can also try Leaky ReLU as the activation.

Just change model.add(Activation("relu" to model.add(LeakyReLU(alpha=[enter alpha, default is 0.3])) and make sure to from keras.layers.advanced_activations import LeakyReLU.

I found this solution here: https://github.com/keras-team/keras/issues/3687

Keras GRU model predicts only [-0., -0., -0., -0., -0.]

3 Answers3