I am a little confused here... I just spent the last hour reading about how to split my dataset into test/train in TensorFlow. I was following this tutorial to import my images: https://www.tensorflow.org/tutorials/load_data/images. Apparently one can split into train/test with sklearn: model_selection.train_test_split
.
But my question is: when do I split my dataset into train/test. I already have done this with my dataset (see below), now what? How do I split it? Do I have to do it before loading the files as tf.data.Dataset
?
# determine names of classes
CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if item.name != "LICENSE.txt"])
print(CLASS_NAMES)
# count images
image_count = len(list(data_dir.glob('*/*.png')))
print(image_count)
# load the files as a tf.data.Dataset
list_ds = tf.data.Dataset.list_files(str(cwd + '/train/' + '*/*'))
Also, my data structure looks like the following. No test folder, no val folder. I would need to take 20% for test from that train set.
train
|__ class 1
|__ class 2
|__ class 3