Tensorflow dataset with variable number of elements

Question

I need a dataset structured to handle a variable number of input images (a set of images) to regress against an integer target variable.

The code I am using to source the images is like this:

import tensorflow as tf
from tensorflow import convert_to_tensor


def read_image_tf(path: str) -> tf.Tensor:
    image = tf.keras.utils.load_img(path)
    return tf.keras.utils.img_to_array(image)

def read_image_list(x, y):
    return tf.map_fn(read_image_tf, x), y


paths_list = [['image_1', 'image_2', 'image_3'], ['image_6'], ['image_4', 'image_5', 'image_8', 'image_19']]

x = tf.ragged.constant(paths_list)
y = tf.constant([1,2,3])

dataset = tf.data.Dataset.from_tensor_slices((x, y))
dataset = dataset.map(lambda x,y: read_image_list(x,y))

This code breaks with TypeError (TypeError: path should be path-like or io.BytesIO, not <class 'tensorflow.python.framework.ops.Tensor'>), as it seems that the map operation is not extracting the paths correctly from the original RaggedTensor. I have also tried to use a generator with similar results. Any help would be much appreciated

AloneTogether · Accepted Answer · 2022-11-30T12:38:36.163

1

Maybe something like this:

import tensorflow as tf

def read_image_tf(path: str) -> tf.Tensor:
    img = tf.io.read_file(path)
    return tf.io.decode_png(img, channels=3) # more generic: tf.io.decode_image

def read_image_list(x, y):
    return tf.map_fn(read_image_tf, x, dtype=tf.uint8), y

paths_list = [['/content/image1.png', '/content/image1.png', '/content/image1.png'], ['/content/image1.png'], ['/content/image1.png', '/content/image1.png', '/content/image1.png', '/content/image1.png']]

x = tf.ragged.constant(paths_list)
y = tf.constant([1,2,3])

dataset = tf.data.Dataset.from_tensor_slices((x, y))
dataset = dataset.map(lambda x, y: read_image_list(x, y))

for x, y in dataset:
  print(x.shape, y)

(3, 100, 100, 3) tf.Tensor(1, shape=(), dtype=int32)
(1, 100, 100, 3) tf.Tensor(2, shape=(), dtype=int32)
(4, 100, 100, 3) tf.Tensor(3, shape=(), dtype=int32)

You can also convert x back to a ragged tensor if you want.

edited Nov 30 '22 at 12:38

answered Nov 30 '22 at 12:16

AloneTogether

25,814
5
20
39

Right, was wondering then how to play out with batch definition (pytorch as custom collate function although I could not find an equivalent for tf), and I end up with an error like this: `InvalidArgumentError: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [2,600,600,3], [batch]: [1,600,600,3] [Op:IteratorGetNext] ` – Alberto Nov 30 '22 at 13:38
Can you show exactly where your error is coming from? Is my suggestion not working? Maybe un-accept then? – AloneTogether Nov 30 '22 at 13:40
sorry, your solution works fine for the problem at hand. Now I am trying to obtain batches and that's where the problem stands at `ds_batch = dataset.batch(2).prefetch(tf.data.AUTOTUNE) / ds_batch.take(2)`. If I try to iterate over batches I got the problem reported above – Alberto Nov 30 '22 at 13:44
Change `read_image_list` to `return tf.RaggedTensor.from_tensor(tf.map_fn(read_image_tf, x, dtype=tf.uint8)[None, ...]), y` and it should work. – AloneTogether Nov 30 '22 at 13:57
cool thanks, just tested that – Alberto Nov 30 '22 at 13:59

Tensorflow dataset with variable number of elements

1 Answers1