I'm trying to create an input dataset into my TF model using a CSV dataset that I have. The dataset has the following scheme:
image_name, label
XXXXXXX.png, some_integer_value
XXXXXXX.png, some_integer_value
I did a bit of research and found that the tf.data.Dataset
API seems to be optimized for this task. I am trying to use tf.data.experimental.make_csv_dataset
in order to do this task. My issue that I'm facing is that I'm not sure how to load in the images into my dataset. I currently have the following setup:
csv_dataset = tf.data.experimental.make_csv_dataset(
PATH_TO_DATA_CSV,
batch_size = 5,
select_columns = ['image_name', 'label'],
label_name = 'label',
num_epochs = 1,
ignore_errors = True
)
My original idea was to use a map on the dataset in order to read the file, doing something like
def process_data(image_name, label):
image_name = image_name.numpy().decode('utf-8')
img = tf.io.read_file(DATA_PATH + '/' + image_name)
img = decode_img(img)
return img, label
csv_dataset = csv_dataset.map(process_data)
But this seems to be throwing the error `File "", line 4, in process_data * image_name = image_name.numpy().decode('utf-8')
AttributeError: 'collections.OrderedDict' object has no attribute 'numpy'`
Should I be approaching the problem this way (and if so, how can I fix my error)? If not, what is the most optimal way to approach this.