Image augmentation with Tensorflow so All classes have EXACT SAME number of images

Question

I want to do multi class image classification for animal classification. The problem is my dataset has different number of images for each classes and the difference are quite awful. For example:

In this example the dataset contains 320 images of 3 classes. The class A has 125 images, the class B has 170 images, and the class C has only 25 images and I wish to augment those classes therefore there will be 200 images for each classes which means 600 images that uniformly distributed to those 3 classes.

However, in my case, there are 60 classes in my dataset. How can I augment all of them so they would have the exact same number of images for all the classes?

Gerry P · Accepted Answer · 2021-04-07T18:47:42.383

It would take considerable coding but you can use the ImageDataGenerator to produce augmented images and store them in a specified directory. Documentation for the generator is here. Alternatively you can use modules like cv2 or PIL that provide functions to transform images. Below is the code you can use with cv2. Note look up the cv2 documentation to see how to specify the image transforms as noted in the code comment. Code is below

import os
import cv2
file_number =130 # set this to the number of files you want
sdir=r'C:\Temp\dummydogs\train' # set this to the main directory that contains yor class directories
slist=os.listdir(sdir)
for klass in slist:
    class_path=os.path.join(sdir, klass)
    filelist=os.listdir(class_path)
    file_count=len(filelist)
    if file_count > file_number:
        # delete files from the klass directory because you have more than you need
        delta=file_count-file_number
        for i in range(delta):
            file=filelist[i]
            fpath=os.path.join (class_path,file)
            os.remove(fpath)
    else:
        # need to add files to this klass so do augmentation using cv3 image transforms
        label='-aug' # set this to a string that will be part of the augmented images file name 
        delta=file_number-file_count
        for i in range(delta):
            file=filelist[i]
            file_split=os.path.split(file)
            index=file_split[1].rfind('.')
            fname=file[:index]
            ext=file[index:]
            fnew_name=fname + '-' +str(i) +'-' + label + ext
            fpath=os.path.join(class_path,file)
            img=cv2.imread(fpath)
            img= cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 
            # look up cv2 documentation and apply image transformation code here
            dest_path=os.path.join(class_path, fnew_name)
            cv2.imwrite(dest_path,img)

score 0 · Answer 2 · answered Mar 10 '22 at 20:49

def dataGenerator(type_, number):
    from keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img
    '''
     type_ :str 
        ex 'CAT' or 'DOG'
     number :int 
        duplicate img x {number}
    '''
    
    datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rotation_range=40, 
    width_shift_range=0.2,
    height_shift_range=0.2, 
    shear_range=0.2,
    zoom_range=0.2,
    fill_mode='nearest',
    horizontal_flip=True,
    )
    
    for filename in os.listdir(f'train/{type_}/'):
        if filename.endswith('.jpeg'):
            img = load_img(f'train/{type_}/{filename}')
            x = img_to_array(img)
            x = x.reshape((1,) + x.shape)

            i = 0
            for batch in datagen.flow(x, batch_size=1, save_to_dir=f'generate_data/{type_}', save_prefix='IMG', save_format='.jpeg'):
                i += 1
                if i == number:
                    break

is use this

Image augmentation with Tensorflow so All classes have EXACT SAME number of images

2 Answers2