My data does not fit into memory so I need to use the equavalent to flow_from_directory of the ImageDataGenerator class, but that supports tensorflow_datasets. I found image_dataset_from_directory, a utility function of keras that generates a tf.data.Dataset from image files in a directory. So I loaded my data (images & masks) as follows.
BATCH_SIZE = None
IMG_HEIGHT = 256
IMG_WIDTH = 256
IMG_CHANNELS=1
seed=42
# setting dictionary for the tf_data_dataset parameters
tf_Dataset_args=dict(labels=None,
label_mode=None,
validation_split=0.2,
batch_size=BATCH_SIZE,
image_size=(IMG_HEIGHT, IMG_WIDTH),
seed=seed,
color_mode="grayscale"
)
#---------- train image split train/val
# image_dataset_from_directory is a utility function of keras taht generates a tf.data.Dataset from image files in a directory.
# And tf.data.Dataset represents a potentially large set of elements.
train_image_ds = tf.keras.utils.image_dataset_from_directory(train_images_path,
subset="training",
**tf_Dataset_args
)
validation_image_ds = tf.keras.utils.image_dataset_from_directory(train_images_path,
subset="validation",
**tf_Dataset_args
)
#----------- train masks split train/val
train_masks_ds = tf.keras.utils.image_dataset_from_directory(train_masks_path,
subset="training",
**tf_Dataset_args
)
validation_masks_ds = tf.keras.utils.image_dataset_from_directory(train_masks_path,
subset="validation",
**tf_Dataset_args
)
Then I combined images and masks to create tf.datset:
#The simplest way to create a dataset is to create it from a python list: nested structure of iamges and masks
train_set=list(zip(train_image_ds, train_masks_ds))#
validation_set=list(zip(validation_image_ds, validation_masks_ds))
training_data = tf.data.Dataset.from_tensor_slices(train_set)# Represents a potentially large set of elements.
validation_data = tf.data.Dataset.from_tensor_slices(validation_set)# I tried zip inside but did not work
Elements of my training and validation are of shape (nb_images,2,256,256,1) or (nb_images/batch_size,2,batch_size,256,256,1) if batch_size is not None.
Adding the below dataugmentation block and passing
data_augmentation = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.Rescaling(1./255),
tf.keras.layers.experimental.preprocessing.RandomFlip("horizontal_and_vertical"),
tf.keras.layers.experimental.preprocessing.RandomRotation(0.5),
# tf.keras.layers.experimental.preprocessing.RandomTranslation(0.3)
# tf.keras.layers.experimental.preprocessing.RandomHeight(0.1),
# tf.keras.layers.experimental.preprocessing.RandomWidth(0.1)
])
I get
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor. Received: inputs=<TensorSliceDataset element_spec=TensorSpec(shape=(2, 256, 256, 1), dtype=tf.float32, name=None)>. Consider rewriting this model with the Functional API. ValueError: Exception encountered when calling layer "rescaling" (type Rescaling).
Attempt to convert a value (<TensorSliceDataset element_spec=TensorSpec(shape=(2, 256, 256, 1), dtype=tf.float32, name=None)>) with an unsupported type (<class 'tensorflow.python.data.ops.dataset_ops.TensorSliceDataset'>) to a Tensor.
Call arguments received: • inputs=<TensorSliceDataset element_spec=TensorSpec(shape=(2, 256, 256, 1), dtype=tf.float32, name=None)>
I also found problem to pass the tf.dataset training_data to the .fit method because of shape inconsistency with the model input shape (None,256,256,1)