0

I've been trying to load image data from a directory using the tf.data.Dataset.from_tensor_slices method.

The data frame I use looks something like the following

    path                    label
0   '../some_dir/0000.jpg'   0
1   '../some_dir/0001.jpg'   0
2   '../some_dir/0002.jpg'   1
...

A sample code looks as follows

import tensorflow as tf
image_paths = tf.convert_to_tensor(tr_df['path'].values, dtype=tf.string)  # this doesn't change anything.
labels = tf.convert_to_tensor(tr_df['label'].values)  # Just leaving it to demonstrate what I've tried.
tr_data = tf.data.Dataset.from_tensor_slices((image_paths,labels))

However, this method seems to return just an empty Dataset:

print(tr_data)
>>>> <TensorSliceDataset shapes: ((), ()), types: (tf.string, tf.int64)>

What exactly has gone wrong here? I have double-checked whether the paths actually point to any documents already.

I have tried to use the tf.keras.preprocessing.image_dataset_from_directory instead, but due to the database structure at hand which I can't change, and the issue described here that is unfortunately not possible.

Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
emilaz
  • 1,722
  • 1
  • 15
  • 31

1 Answers1

3

It's not empty. It's just how 1D data is shown. See:

import tensorflow as tf

x = ['hi', 'hello', 'greetings']
y = [0, 1, 0]

ds = tf.data.Dataset.from_tensor_slices((x, y))

ds
<TensorSliceDataset shapes: ((), ()), types: (tf.string, tf.int32)>

It seems empty, but if you iterate through it, it will work:

next(iter(ds))
(<tf.Tensor: shape=(), dtype=string, numpy=b'hi'>,
 <tf.Tensor: shape=(), dtype=int32, numpy=0>)

If you really want to see a shape you can make it into 2D:

x = [['hi'], ['hello'], ['greetings']]
y = [[0], [1], [0]]

ds = tf.data.Dataset.from_tensor_slices((x, y))

ds
<TensorSliceDataset shapes: ((1,), (1,)), types: (tf.string, tf.int32)>

Btw I don't think you need tf.convert_to_tensor

Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143