0

I'm trying to create an input dataset into my TF model using a CSV dataset that I have. The dataset has the following scheme:

image_name, label
XXXXXXX.png, some_integer_value
XXXXXXX.png, some_integer_value

I did a bit of research and found that the tf.data.Dataset API seems to be optimized for this task. I am trying to use tf.data.experimental.make_csv_dataset in order to do this task. My issue that I'm facing is that I'm not sure how to load in the images into my dataset. I currently have the following setup:

csv_dataset = tf.data.experimental.make_csv_dataset(
  PATH_TO_DATA_CSV, 
  batch_size = 5, 
  select_columns = ['image_name', 'label'],
  label_name = 'label', 
  num_epochs = 1, 
  ignore_errors = True
)

My original idea was to use a map on the dataset in order to read the file, doing something like

def process_data(image_name, label):
  image_name = image_name.numpy().decode('utf-8')
  img = tf.io.read_file(DATA_PATH + '/' + image_name)
  img = decode_img(img)
  return img, label

csv_dataset = csv_dataset.map(process_data)

But this seems to be throwing the error `File "", line 4, in process_data * image_name = image_name.numpy().decode('utf-8')

AttributeError: 'collections.OrderedDict' object has no attribute 'numpy'`

Should I be approaching the problem this way (and if so, how can I fix my error)? If not, what is the most optimal way to approach this.

1 Answers1

0

Can use tf.data.Dataset.from_tensor_slices in conjunction with Pandas (for all_image_paths and all_image_labels) for something like

def load_and_preprocess_image(path):
  image_string = tf.compat.as_str_any(path)
  image_string = tf.io.read_file(path)
  img = tf.io.decode_png(image_string, channels=3)
  return tf.image.resize(img, [1000, 1000])

def load_and_preprocess_from_path_labels(path, label):
  return load_and_preprocess_image(path), label

ds = tf.data.Dataset.from_tensor_slices((all_image_paths, all_image_labels))
csv_dataset = ds.map(load_and_preprocess_from_path_labels, num_parallel_calls=tf.data.AUTOTUNE)