I have a script that uses the MTCNN face detection library that iterates through a fair amount of directories, totaling thousands of images. An issue that I've been running into with this script is the excessive memory usage when processing all of these images, which will eventually cause my MacBook (16gb of RAM) to run out of memory. What I'm looking to do is to implement batching on a folder by folder basis, instead of a specific batch limit because none of the folders contain enough images individually that would make the system run out of memory.
# open up the csv file
with open(csv_path, 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['Index', 'Threshhold', 'Path'])
for path, subdirs, files in os.walk(path):
for name in files:
if name == '.DS_Store':
print("Skipping .DS_Store")
continue
else:
try:
image = os.path.join(path, name)
pixels = pyplot.imread(image)
print("Processing " + image)
print("Count: " + str(inc))
# calculate the area of the image
total_height = pixels.shape[0]
total_width = pixels.shape[1]
total_area = total_height * total_width
# create the detector, using default weights
detector = MTCNN()
faces = detector.detect_faces(pixels)
ax = pyplot.gca()
face_total_area = 0
if faces == []:
print("No faces detected.")
# pass in 0 for the threshold becuase there's no faces
#write_to_csv(inc, 0, image)
print()
else:
for face in faces:
# get dimensions from the face
x, y, width, height = face['box']
# calculate the area of the face
face_area = width * height
face_total_area += face_area
threshold = face_total_area / total_area
# write to csv only if the threshold is less than the limit
# change back to this eventually ^^^^^^^^^
if threshold > threshhold_limit:
print("Facial area is over the threshold - writing file path to csv.")
write_to_csv(inc, threshold, image)
else:
print("Image threshold is under the limit - good")
print(threshold)
print()
inc += 1
except:
print("Processing error - skipping image")
Is something like this possible to do? Or should it be done a different way? The idea is that batching like this will allow mtcnn to release the memory it's holding onto when it's done processing that folder.