Context
I am currently running some experiments with LSTMs / GRUs in Keras. Anyhow, the following questions also relate to the general functionality of these networks, which means an answer does not have to be Keras-specific.
For my experiments I choosed to predict a linear growing time series in form of a range(10,105,5)
so that I would obviously get good results. My generator for this data follows this tutorial (simply an implementation of Keras' TimeSeriesGenerator
).
[[[10. 15.]
[20. 25.]]] => [[30. 35.]]
...
[[[80. 85.]
[90. 95.]]] => [[100. 105.]]
This results in 8 steps_per_epoch
and a sample of shape (8, 1, 2, 2)
.
I then set up a simple network in Keras, which I trained over 500 epochs
:
model = Sequential()
model.add(GRU(100, activation='relu', input_shape=(n_input, n_features), batch_size=1)) #Could also be a LSTM-layer
model.add(Dense(2)) #Following the target-shape
model.compile(optimizer='adam', loss='mse')
If I predict some data afterwards like this...
x_input = np.array([[90, 95], [100, 105]]).reshape((1, n_input, n_features))
yhat = model.predict(x_input, verbose=0)
... the result/prediction is [[111.1233 116.97075]]
(good enough for the experiment -> correct is [[110.0 115.0]]
).
My questions
Obviously 500 epochs is much more than needed for this amount of data.
In order to get more training data without increasing the actual data (in a real scenario this would not be possible as well) I came up with the idea of using overlapping sliding windows (the batches shown above are non-overlapping).
The batches then look like this:
[[[10. 15.]
[20. 25.]]] => [[30. 35.]]
[[[15. 20.]
[25. 30.]]] => [[35. 40.]]
[[[20. 25.]
[30. 35.]]] => [[40. 45.]]
...
In theory this meant way more batches for me and I thought that the training quality would increase accordingly.
Anyhow, feeding this data to the same network results in the following prediction: [[121.1334 134.70979]]
. Well.. these are far worse prediction results.
My questions now are:
- Is this an expected behavior from LSTMs / GRUs? Why are overlapping windows a problem for them?
- Is there a way to increase my data in such a way without ruining the prediction-quality?