To put it simply, I'd like to be able to use a keras dataset created from a local image directory to train an autoencoder. To clarify, this is a model that approximates the Identity function for images : ideally, the output is exactly equal to the input.
The dataset is too large to fit in memory, so converting the dataset to a numpy array with np.concatenate will not help me here.
Or in other words, I'd like an Identity image dataset, where the label for each image in the dataset is exactly equal to the image itself.
Here's my (non-working) sample code:
train_ds, validate_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
labels=None,
validation_split=0.1,
subset="both",
shuffle=True,
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size,
crop_to_aspect_ratio=True)
history = autoencoder.fit(
x=train_ds,
y=train_ds,
validation_data=(validate_ds, validate_ds),
epochs=epochs,
batch_size=16
)
The image_dataset_from_directory function gives me a dataset of images with no labels. So far so good.
The second command fails with the error message:
ValueError: `y` argument is not supported when using dataset as input.
On the other hand, if I exclude the y variable I get this error:
ValueError: Target data is missing. Your model was compiled with loss=binary_crossentropy, and therefore expects target data to be provided in `fit()`.
Which is not at all surprising, because there are NO labels, as I requested none. But yet it won't let me use the dataset as the labels which is what I need to do.
Any help would be appreciated.