Stratified train-test splitting a Tensorflow dataset

Question

I am currently working with a quite large image-dataset and I loaded it using ImageDataGenerator from tensorflow.keras in python. As the classification of my data is very imbalanced I wanted to do a stratified train-test-split to possibly achieve a higher accuracy.

I know how to do a simple random train-test-split using ImageDataGenerator but I couldn't find any equivalent of the stratified train_test_split you can do in sklearn.

Is there any way to stratified train-test-split a tensorflow.data.Dataset? And if not, how do you deal with large imbalanced datasets? I would very appreciate your help!

Here is the relevant code:

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator()
dataset = datagen.flow_from_directory(
    path_images, 
    target_size=(ImageHeight, ImageWidth), 
    color_mode='rgb', 
    class_mode='sparse', 
    batch_size=BatchSize, 
    shuffle=True, 
    seed=Seed,
)

score 0 · Answer 1 · answered Jun 25 '22 at 01:56

0

flow( x, y=None, batch_size=32, shuffle=True, sample_weight=None, seed=None, save_to_dir=None, save_prefix='', save_format='png', ignore_class_split=False, subset=None )

answered Jun 25 '22 at 01:56

Pramod Sharma

26
3

This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/32101392) – Trenton McKinney Jun 30 '22 at 17:42

Stratified train-test splitting a Tensorflow dataset

1 Answers1