I have a very huge database of images locally, with the data distribution like each folder cointains the images of one class.
I would like to use the tensorflow dataset API to obtain batches de data without having all the images loaded in memory.
I have tried something like this:
def _parse_function(filename, label):
image_string = tf.read_file(filename, "file_reader")
image_decoded = tf.image.decode_jpeg(image_string, channels=3)
image = tf.cast(image_decoded, tf.float32)
return image, label
image_list, label_list, label_map_dict = read_data()
dataset = tf.data.Dataset.from_tensor_slices((tf.constant(image_list), tf.constant(label_list)))
dataset = dataset.shuffle(len(image_list))
dataset = dataset.repeat(epochs).batch(batch_size)
dataset = dataset.map(_parse_function)
iterator = dataset.make_one_shot_iterator()
image_list is a list where the path (and name) of the images have been appended and label_list is a list where the class of each image has been appended in the same order.
But the _parse_function does not work, the error that I recibe is:
ValueError: Shape must be rank 0 but is rank 1 for 'file_reader' (op: 'ReadFile') with input shapes: [?].
I have googled the error, but nothing works for me.
If I do not use the map function, I just recibe the path of the images (which are store in image_list), so I think that I need the map function to read the images, but I am not able to make it works.
Thank you in advance.
EDIT:
def read_data():
image_list = []
label_list = []
label_map_dict = {}
count_label = 0
for class_name in os.listdir(base_path):
class_path = os.path.join(base_path, class_name)
label_map_dict[class_name]=count_label
for image_name in os.listdir(class_path):
image_path = os.path.join(class_path, image_name)
label_list.append(count_label)
image_list.append(image_path)
count_label += 1