4

There are 10 directories(labels) each with 800 images. I'm trying to use transfer learning to train my model. The data is loaded using ImageDataGenerator as shown below:

train_datagen = ImageDataGenerator(rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.2) # set validation split

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary',
    subset='training') # set as training data

validation_generator = train_datagen.flow_from_directory(
    train_data_dir, # same directory as training data
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary',
    subset='validation') # set as validation data

model.fit_generator(
    train_generator,
    steps_per_epoch = train_generator.samples // batch_size,
    validation_data = validation_generator, 
    validation_steps = validation_generator.samples // batch_size,
    epochs = nb_epochs)

Is it possible to limit the number of images used from each directory to 100 or N images instead of all 800 images using ImageDataGenerator?

Jedi Nerd
  • 49
  • 8
  • Duplicated with [Keras flow_from_directory limiting number of examples](https://stackoverflow.com/questions/54152216/keras-flow-from-directory-limiting-number-of-examples) – Will Feb 28 '20 at 09:53
  • 1
    @Will the thread tagged by you provides a solution to split the directory into test and train data by setting the validation_split. That is already done here, Therefore out of 800 images in each class 640 is training and 160 is testing data. I require a total of 100 images in each class and 80 is training and 20 should be for testing. Is there a way to consider only 100 images in total for each class? – Jedi Nerd Feb 28 '20 at 10:12

1 Answers1

2
def limit_data(data_dir,n=100):
    a=[]
    for i in os.listdir(data_dir):
        for k,j in enumerate(os.listdir(data_dir+'/'+i)):
            if k>n:continue
            a.append((f'{data_dir}/{i}/{j}',i))
    return pd.DataFrame(a,columns=['filename','class'])

Then use flow_from_dataframe method

user3731622
  • 4,844
  • 8
  • 45
  • 84
Smart Manoj
  • 5,230
  • 4
  • 34
  • 59