LSTM/GRU and the use of overlapping sliding windows

Question

Context

I am currently running some experiments with LSTMs / GRUs in Keras. Anyhow, the following questions also relate to the general functionality of these networks, which means an answer does not have to be Keras-specific.

For my experiments I choosed to predict a linear growing time series in form of a range(10,105,5) so that I would obviously get good results. My generator for this data follows this tutorial (simply an implementation of Keras' TimeSeriesGenerator).

[[[10. 15.]
  [20. 25.]]] => [[30. 35.]]
...
[[[80. 85.]
  [90. 95.]]] => [[100. 105.]]

This results in 8 steps_per_epoch and a sample of shape (8, 1, 2, 2). I then set up a simple network in Keras, which I trained over 500 epochs:

model = Sequential() 
model.add(GRU(100, activation='relu', input_shape=(n_input, n_features), batch_size=1)) #Could also be a LSTM-layer
model.add(Dense(2)) #Following the target-shape
model.compile(optimizer='adam', loss='mse')

If I predict some data afterwards like this...

x_input = np.array([[90, 95], [100, 105]]).reshape((1, n_input, n_features))
yhat    = model.predict(x_input, verbose=0)

... the result/prediction is [[111.1233 116.97075]] (good enough for the experiment -> correct is [[110.0 115.0]]).

My questions

Obviously 500 epochs is much more than needed for this amount of data.

In order to get more training data without increasing the actual data (in a real scenario this would not be possible as well) I came up with the idea of using overlapping sliding windows (the batches shown above are non-overlapping).

The batches then look like this:

[[[10. 15.]
  [20. 25.]]] => [[30. 35.]]
[[[15. 20.]
  [25. 30.]]] => [[35. 40.]]
[[[20. 25.]
  [30. 35.]]] => [[40. 45.]]
...

In theory this meant way more batches for me and I thought that the training quality would increase accordingly.

Anyhow, feeding this data to the same network results in the following prediction: [[121.1334 134.70979]]. Well.. these are far worse prediction results.

My questions now are:

Is this an expected behavior from LSTMs / GRUs? Why are overlapping windows a problem for them?
Is there a way to increase my data in such a way without ruining the prediction-quality?

LSTM/GRU and the use of overlapping sliding windows

0 Answers0