2

I have two Tensorflow datasets which I process separately to get different windows for features and target:

window_size_x = 3
window_size_y = 2
shift_size = 1

x = np.arange(10)
y = x * 10

x = x[:-window_size_y]
y = y[window_size_x:]

ds_x = tf.data.Dataset.from_tensor_slices(x).window(window_size_x, shift=shift_size, drop_remainder=True)
ds_y = tf.data.Dataset.from_tensor_slices(y).window(window_size_y, shift=shift_size, drop_remainder=True)

for i, j in zip(ds_x, ds_y):
  print(list(i.as_numpy_iterator()), list(j.as_numpy_iterator()))

Output:

[0, 1, 2] [30, 40]
[1, 2, 3] [40, 50]
[2, 3, 4] [50, 60]
[3, 4, 5] [60, 70]
[4, 5, 6] [70, 80]
[5, 6, 7] [80, 90]

When I finally feed these two datasets into the model using model.fit(ds_x, ds_y) I get the following error:

ValueError: `y` argument is not supported when using dataset as input.

When I try to combine both datasets like in this answer, I get another error:

ds_all = tf.data.Dataset.from_tensor_slices((ds_x, ds_y))

Error:

ValueError: Slicing dataset elements is not supported for rank 0.

What is the proper way to combine two datasets?

Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73

2 Answers2

1

Use tf.data.Dataset.zip to combine features and labels.

ds_all = tf.data.Dataset.from_tensor_slices(*tf.data.Dataset.zip(
                                               (ds_x.batch(BATCH_SIZE),
                                                ds_y.batch(BATCH_SIZE))
                                            ))
rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • 1
    Thank you for your answer. Is `BATCH_SIZE` is only used during datasets zipping, or it also will be used during model training? If it's only for zipping, can I use `BATCH_SIZE` = len(ds_x)`? – Mykola Zotko Jul 01 '22 at 06:48
  • I get the following error: `TypeError: Inputs to a layer should be tensors. Got: <_VariantDataset element_spec=TensorSpec(shape=(24,), dtype=tf.float32, name=None)>` as I feed my data to model. Do you have any idea why? – Mykola Zotko Jul 01 '22 at 09:14
1

Maybe try something like this:

import tensorflow as tf
import numpy as np

window_size_x = 3
window_size_y = 2
shift_size = 1

x = np.arange(10)
y = x * 10

x = x[:-window_size_y]
y = y[window_size_x:]

ds_x = tf.data.Dataset.from_tensor_slices(x).window(window_size_x, shift=shift_size, drop_remainder=True).flat_map(lambda x: x.batch(window_size_x))
ds_y = tf.data.Dataset.from_tensor_slices(y).window(window_size_y, shift=shift_size, drop_remainder=True).flat_map(lambda x: x.batch(window_size_y))
dataset = tf.data.Dataset.zip((ds_x, ds_y))
for i, j in dataset:
  print(i, j)

You can then feed dataset directly to model.fit(*).

AloneTogether
  • 25,814
  • 5
  • 20
  • 39