1

I have been working on a time series analysis using a LSTM model implemented via TensorFlow 2.0. Now, it is clear that this is accomplished by creating windows of data. For instance, the first 30 values of the time series is a window, i.e. the input and the next value will be the target.

I came across the following function for creating these windows,

def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
    series = tf.expand_dims(series, axis=-1)
    ds = tf.data.Dataset.from_tensor_slices(series)
    ds = ds.window(window_size + 1, shift=1, drop_remainder=True)
    ds = ds.flat_map(lambda w: w.batch(window_size + 1))
    ds = ds.shuffle(shuffle_buffer)
    ds = ds.map(lambda w: (w[:-1], w[1:]))
    return ds.batch(batch_size).prefetch(1)

This returns a tf.data.Dataset object when a series is passed in along with the other arguments like so,

window_size = 30
batch_size = 32
shuffle_buffer_size = 1000
series_dataset = windowed_dataset(series_train, window_size, batch_size=128, shuffle_buffer=shuffle_buffer_size)

On examination of this object, I found that each element is a batch of 128 windows and each window contains 30 values (as defined by the arguments passed).

This is all well and good, but what confuses me is that after defining a model like so,

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv1D(filters=32, kernel_size=3,
                      strides=1, padding="causal",
                      activation="relu",
                      input_shape=[None, 1]),
  tf.keras.layers.LSTM(32, return_sequences=True),
  tf.keras.layers.LSTM(32, return_sequences=True),
  tf.keras.layers.Dense(1),
  tf.keras.layers.Lambda(lambda x: x * 200)
])

optimizer = tf.keras.optimizers.SGD(lr=1e-7, momentum=0.9)
model.compile(loss=tf.keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])

this dataset itself can be passed in to the model.fit() method,

history = model.fit(dataset,epochs=500)

How is it possible that the target does not need to be defined here? You would normally need to do something like model.fit(x=inputs, y=targets, epochs=num_epochs). How come this is not necessary?

Minura Punchihewa
  • 1,498
  • 1
  • 12
  • 35
  • 1
    Your dataset has both inputs and targets (the line, `ds = ds.map(lambda w: (w[:-1], w[1:]))`). So when passing a `tf.data.Dataset`, Keras will take care of separating `x` and `y`. – thushv89 Mar 19 '21 at 00:28
  • Can you please explain a little more about how this dataset object works? I am a little confused with it. The main concern I have here is that I want to incorporate a few other features to this analysis, therefore, I believe that I will need to separate out the target and the inputs. I hope to build the model using the Functional API. – Minura Punchihewa Mar 19 '21 at 03:34
  • You can refer `DNN for Time Series` [section](https://charon.me/posts/keras/keras4/) and explanation is : first we will create a simple data set containing `10` elements from `0` to `9`. Next we will `window` the data into chunks of `5` items, shifting by `1` each time. To get chunks of five records, we will set `drop_reminder = true`. Next is to split into `x's` and `y's` using `lambda`. Next is to shuffle the data, this helps us to rearrange the data so as not to accidentally introduce a sequence bias.By setting batch size of `2,` our data gets batched into two x's and two y's at a time. –  Apr 01 '21 at 12:48
  • Is it not possible to do this without using this Dataset object? – Minura Punchihewa Apr 02 '21 at 12:14

0 Answers0