I'm new to tensorflow/keras and I have a file structure with 3000 folders containing 200 images each to be loaded in as data. I know that keras.preprocessing.image_dataset_from_directory allows me to load the data and split it into training/validation set as below:
val_data = tf.keras.preprocessing.image_dataset_from_directory('etlcdb/ETL9G_IMG/',
image_size = (128, 127),
validation_split = 0.3,
subset = "validation",
seed = 1,
color_mode = 'grayscale',
shuffle = True)
Found 607200 files belonging to 3036 classes. Using 182160 files for validation.
But then I'm not sure how to further split my validation into a test split while maintaining proper classes. From what I can tell (through the GitHub source code), the take method simply takes the first x elements of the dataset, and skip does the same. I am unsure if this maintains stratification of the data or not, and I'm not quite sure how to return labels from the dataset to test it.
Any help would be appreciated.