I am currently working with a quite large image-dataset and I loaded it using ImageDataGenerator
from tensorflow.keras
in python. As the classification of my data is very imbalanced I wanted to do a stratified train-test-split to possibly achieve a higher accuracy.
I know how to do a simple random train-test-split using ImageDataGenerator
but I couldn't find any equivalent of the stratified train_test_split you can do in sklearn
.
Is there any way to stratified train-test-split a tensorflow.data.Dataset
?
And if not, how do you deal with large imbalanced datasets?
I would very appreciate your help!
Here is the relevant code:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator()
dataset = datagen.flow_from_directory(
path_images,
target_size=(ImageHeight, ImageWidth),
color_mode='rgb',
class_mode='sparse',
batch_size=BatchSize,
shuffle=True,
seed=Seed,
)