0

main idea of undersampling is randomly delete the class which has sufficient observations so that the comparative ratio of two classes is significant in our data. So, how to undersampling with image data in python? please help me:(

I took the fundus image data from Kaggle. there are 35127 images with 5 classes. class 0: 25810 data, class 1: 2443 data, class 2: 5292 data, class 3: 873 data, class 4: 708 data,

I want each class to have as much as 708 images following the 4th class. How do I delete the rest of the images in Python?

hilyap
  • 1
  • 2
  • please provide with more information, some sample code or example. – Aswin Jan 11 '20 at 05:21
  • I took the fundus image data from Kaggle. there are 35127 images with 5 classes. class 0: 25810 data, class 1: 2443 data, class 2: 5292 data, class 3: 873 data, class 4: 708 data, I want each class to have as much as 708 images following the 4th class. How do I delete the rest of the images in Python? – hilyap Jan 11 '20 at 05:31
  • This is too broad/vague, and probably off-topic IMO. – AMC Jan 11 '20 at 06:52

1 Answers1

1

I know it is an old question but for the sake of people looking for the answer, this code works perfectly:

    path = r'C:/The_Path'# You can provide the path here
    n = 2500 # Number of random images to be removed
    img_names = os.listdir(path)  # Get image names in folder
    img_names = random.sample(img_names, n)  # Pick 2500 random images
    for image in img_names:  # Go over each image name to be deleted
        f = os.path.join(path, image)  # Create valid path to image
        os.remove(f)  # Remove the image

As your question states, you want all classes to be equal to class 4, i.e., 708 images. Simply find out the difference and replace n, for example, the difference between the number of class 3 images and 708 images are 165 images and so n = 165. Furthermore, you can make this into a function to generalise it more.

The code has been taken from, but edited:

How can i delete multiple images from multiple folders using python

https://stackoverflow.com/users/10512332/vikrant-sharma answered the question.

Thank you!

Umair Nasir
  • 43
  • 1
  • 8