2

I have a very huge database of images locally, with the data distribution like each folder cointains the images of one class.

I would like to use the tensorflow dataset API to obtain batches de data without having all the images loaded in memory.

I have tried something like this:

def _parse_function(filename, label):
    image_string = tf.read_file(filename, "file_reader")
    image_decoded = tf.image.decode_jpeg(image_string, channels=3)
    image = tf.cast(image_decoded, tf.float32)
    return image, label

image_list, label_list, label_map_dict = read_data()

dataset = tf.data.Dataset.from_tensor_slices((tf.constant(image_list), tf.constant(label_list)))
dataset = dataset.shuffle(len(image_list))
dataset = dataset.repeat(epochs).batch(batch_size)

dataset = dataset.map(_parse_function)

iterator = dataset.make_one_shot_iterator()

image_list is a list where the path (and name) of the images have been appended and label_list is a list where the class of each image has been appended in the same order.

But the _parse_function does not work, the error that I recibe is:

ValueError: Shape must be rank 0 but is rank 1 for 'file_reader' (op: 'ReadFile') with input shapes: [?].

I have googled the error, but nothing works for me.

If I do not use the map function, I just recibe the path of the images (which are store in image_list), so I think that I need the map function to read the images, but I am not able to make it works.

Thank you in advance.

EDIT:

    def read_data():
        image_list = []
        label_list = []
        label_map_dict = {}
        count_label = 0

        for class_name in os.listdir(base_path):
            class_path = os.path.join(base_path, class_name)
            label_map_dict[class_name]=count_label

            for image_name in os.listdir(class_path):
                image_path = os.path.join(class_path, image_name)

                label_list.append(count_label)
                image_list.append(image_path)

            count_label += 1
Laaa
  • 37
  • 2
  • 5
  • How your read_data function works? Your pipeline and parse function look ok, there's obviously a type mismatch. tf.read_file accepts a python string with filename. – Sharky Feb 08 '19 at 17:30
  • HI @Sharky, Thank you for your interest! I have edited the code and added the function that reads the code – Laaa Feb 11 '19 at 12:08

1 Answers1

2

The error is in this line dataset = dataset.repeat(epochs).batch(batch_size) Your pipeline adds batchsize as a dimension to input.

You need to batch your dataset after map function like this

    dataset = tf.data.Dataset.from_tensor_slices((tf.constant(image_list), tf.constant(label_list)))
    dataset = dataset.shuffle(len(image_list))
    dataset = dataset.repeat(epochs)
    dataset = dataset.map(_parse_function).batch(batch_size)
Sharky
  • 4,473
  • 2
  • 19
  • 27
  • Thank you very much! It works! Now, I have a problem with the image size because they are not same, do you know how to add the resize part in the _parse_function? – Laaa Feb 20 '19 at 11:27
  • You can use `tf.image.resize_images` or `tf.image.resize_image_with_crop_or_pad` to change image size. Or `tf.reshape` to change the shape of a tensor – Sharky Feb 20 '19 at 15:20