Cropping and storing bounding box image regions for a collection of images?

Question

The current code aims to crop and store multiple bounding box image regions for a set of images in a folder. The cropped bounding box image regions are store to a different folder. There are a total of 100 images, each image has multiple bounding boxes. The CSV file contains multiple bounding box coordinates for every given image. The code is as shown:

import pandas as pd
import cv2
import numpy as np
import glob
import os

filenames = glob.glob("folder/abnormal/*.png")
filenames.sort()
images = [cv2.imread(img) for img in filenames]
print(images)
df = pd.read_csv('abnormal.csv')

for img in images:
    for i in range(len(df)):
        name = df.loc[i]['patientId']
        start_point = (df.loc[i]['x_dis'],df.loc[i]['y_dis'])  
        end_point = (df.loc[i]['x_dis']+df.loc[i]['width_dis'],df.loc[i]['y_dis']+df.loc[i]['height_dis'])  
        crop = img[df.loc[i]['y_dis']:df.loc[i]['y_dis']+df.loc[i]['height_dis'],
                     df.loc[i]['x_dis']:df.loc[i]['x_dis']+df.loc[i]['width_dis']]
        cv2.imwrite("abnormal/crop_{0}.png".format(i), crop)

On running the code above, the loop continues indefinitely. It happens so that all crops are with respect of the bounding box image regions for image1, and then all crops stored are converted with respect of the bounding box image regions for image2, and so on. What is needed is the multiple box regions for each image and cropped and stored once.The images start with name patient*.png (patient1.png) or patient*.*.png (patient1_1.png).

So what is the issue? Seems very similar to your previous question. How do you name your images in folder1? You need two "for" loops. One over each image in folder1 and one over every crop from the csv file. — fmw42, May 16 '20 at 02:31
@fmw42 well. need help in running multiple for loops as i am a beginner in python. The images start with patient* (e.g. patient1) or patient*.* (e.g. patient1_1) and are .png files. Thanks. — shiva, May 16 '20 at 23:07
You need to loop over each file name in the list of images you have to load the image. Search Google for "python opencv load folder of images" and you will find many examples. For example, see https://stackoverflow.com/questions/38675389/python-opencv-how-to-load-all-images-from-folder-in-alphabetical-order. That will be your outer (first) loop. Then you indent for your second loop that you have already that crops all the sections you want. It is always a good idea to search Google and/or this forum to find examples to learn before asking questions in this forum. Test loading and viewing. — fmw42, May 16 '20 at 23:49
@fmw42: thanks. i modified the code as shown above. Each image has only one bounding box and there are 200 images. When I run the code above, the loop goes indefinite. The images start with patient* (e.g. patient1) or patient*.* (e.g. patient1_1) and are .png files. Could you please help with corrections in the code? thanks. — shiva, May 18 '20 at 17:10
What is the length of images, i.e. the number of images in the list? If reasonable, then put a counter in your images list and print the count value as the processing is being done. If you have lots of images and do lots of crops, then it will take a long time. — fmw42, May 18 '20 at 17:47
@fmw42: there are only 30 images. I see that the 30 crops are generated (because each image has only one bounding box) but for the same image (say patient1 is repeated 30 times). And then, all the images get transformed to the image 2(say patient2) and the loop continues indefinitely. Hope, I am making sense. — shiva, May 19 '20 at 00:08

score 1 · Accepted Answer · answered May 20 '20 at 03:06

The following code snippet should do the job:

filenames = glob.glob("folder/abnormal/*.png")
filenames.sort()
df = pd.read_csv('abnormal.csv')
im_csv_np = df.loc[:,"patientId"].values

for f in filenames:
    img = cv2.imread(f)
    img_name = f.split(os.sep)[-1]
    idx = np.where(im_csv_np == img_name)
    if idx[0].shape[0]: # if there is a match shape[0] should 1, if not 0
        for i in idx:
            name = df.loc[i]['patientId']
            start_point = (df.loc[i]['x_dis'],df.loc[i]['y_dis'])  
            end_point = (df.loc[i]['x_dis']+df.loc[i]['width_dis'],df.loc[i]['y_dis']+df.loc[i]['height_dis'])  
            crop = img[df.loc[i]['y_dis']:df.loc[i]['y_dis']+df.loc[i]['height_dis'],
                        df.loc[i]['x_dis']:df.loc[i]['x_dis']+df.loc[i]['width_dis']]
            cv2.imwrite("abnormal/crop_{0}.png".format(i), crop)

Thanks. However, the code throws an error: File "", line 22, in df.loc[i]['x_dis']:df.loc[i]['x_dis']+df.loc[i]['width_dis']] TypeError: slice indices must be integers or None or have an __index__ method. Also, do we need to use img_name = f.split(os.sep)[-1] because patientId contains images with .png extension. And when we do, idx = np.where(im_csv_np == img_name), it will search for the image names with .png extension, right? — shiva, May 20 '20 at 13:59
Try printing out the i value and type to check the error. We are basically matching the string `img_name` with `patientId` value, if both have *.png then there shouldn't be a problem. Try printing variables, this can help you for easier analysis. — Sdhir, May 20 '20 at 14:33

Cropping and storing bounding box image regions for a collection of images?

1 Answers1