0

I want add time step dimension to my batch generation.

Currently I am doing

train_ds = tf.data.Dataset.\
    from_tensor_slices((x_train, y_train)).\
    shuffle(10000).\
    batch(32)

and getting batches of size (32, feature_vector_length)

I want to add time_step_dimention to my batch to have (batch_size,time_stemp,feature_vector_length)

How it can be dome using tf.data ?

Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
Night Walker
  • 20,638
  • 52
  • 151
  • 228

1 Answers1

1

You can use the .window() method of tf.data.Dataset() and .zip() your input and output.

import tensorflow as tf
import numpy as np

x_train = np.random.rand(1000, 5)
y_train = np.sum(x_train, axis=1)

x = tf.data.Dataset.from_tensor_slices(x_train).\
    window(size=3, shift=1, stride=1, drop_remainder=True).\
    flat_map(lambda l: l.batch(3))

y = tf.data.Dataset.from_tensor_slices(y_train)

ds = tf.data.Dataset.zip((x, y)).batch(2, drop_remainder=True)

for xx, yy in ds:
    print(xx, yy)
    break
tf.Tensor(
[[[0.85339111 0.00937855 0.6432005  0.31875691 0.83835893]
  [0.91914805 0.13469408 0.40381527 0.80296816 0.4389627 ]
  [0.40326491 0.28575999 0.86602507 0.40515333 0.35390637]]
 [[0.91914805 0.13469408 0.40381527 0.80296816 0.4389627 ]
  [0.40326491 0.28575999 0.86602507 0.40515333 0.35390637]
  [0.00197349 0.46558597 0.66426367 0.00787106 0.07879078]]], shape=(2, 3, 5), 
    dtype=float64) tf.Tensor([2.663086   2.69958826], shape=(2,), dtype=float64)

This is a batch of 2 tensors of 3 time steps of 5 features and their respective target.

Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
  • Great example. I noticed, however, that the Y targets correspond to the first of the 3 X inputs in the window. In most cases you'll probably want them to be the targets for the last one instead. But that's easy to fix with y = y.skip(2) before the zip – royalstream Jul 20 '21 at 00:41