2

I'm using that function to look for images but that's actually a very slow. I wonder if there is a faster way to do that.


import os
import cv2
images = []
def load_images_from_folder(folder):
    global images
    os.chdir(folder)
    for filename in os.listdir(folder):
        if os.path.isdir(os.path.join(folder,filename)):
            try:
                load_images_from_folder(os.path.join(folder, filename))
            except:
                pass
        img = cv2.imread(os.path.join(folder,filename))
        if img is not None:
            images.append(img)
load_images_from_folder("C:\\")
  • 2
    Do you have an estimate of the total number of files on your system? I expect that number to be very high. So maybe there is a faster way, but no fast way. Print every filename that is checked to see how much the program is actually doing. One thing you can try is using `filename.endswith(("jpg", "jpeg", "png", ...))` to check for image files instead of trying to open them all. That would be faster. – Niklas Mertsch Jul 14 '20 at 10:15
  • 1
    Also note that there are might be "much faster" solution ... depending on your operating system. Mac OS already has a "full" file index, and there are various solutions for Linux that give you the same: a complete index of all files. And of course, it is much faster to walk over such an index, compared to turn to the file system "yourself". So the answers give a generic answer, but I am sure that one could get to "better" solutions for specific platforms. – GhostCat Jul 14 '20 at 11:09

2 Answers2

2

As recommended in this answer you can use imghdr built-in module to test if a file is an image:

import imghdr
import os

def get_images(root_directory):
    images = []
    for root, dirs, filenames in os.walk(root_directory):
        for filename in filenames:
            path = os.path.join(root, filename)
            if imghdr.what(path) is not None:
                images.append(path)
    return images

But imghdr module only detects several image types, according to documentation it cannot detect swg.

As @NiklasMertsch suggested you can just check for image file extensions like this:

import os

extensions = [
    '.png',
    '.jpg',
    '.jpeg',
    '.bmp',
    '.gif',
    '.tiff',
    '.swg',
]

def get_images(root_directory):
    images = []
    for root, dirs, filenames in os.walk(root_directory):
        for filename in filenames:
            for extension in extensions:
                if filename.endswith(extension):
                    images.append(os.path.join(root, filename))
                    break
    return images
Sergey Shubin
  • 3,040
  • 4
  • 24
  • 36
1

You can use the library treeHandler to achieve this. You can install it using pip install treeHandler

from treeHandler import treeHandler
import os
import cv2
inputFolder='SampleData'

### initialising treeHandler object
th=treeHandler()
### calling getFiles function to get all jpg files, you can add more extensions like ['jpg','png','bmp'] if required
fileTuple=th.getFiles(inputFolder,['jpg'])

Here is what fileTuple sample looks like:

[('image01.jpg', 'sampleData'),  ('image02.jpg', 'sampleData'),  ('image03.jpg', 'sampleData'),  ('image04.jpg', 'sampleData'),  ('image11.jpg', 'sampleData/folder1'),  ('image12.jpg', 'sampleData/folder1'),  ('image111.jpg', 'sampleData/folder1/folder11'),  ('image112.jpg', 'sampleData/folder1/folder11'),  ('image1111.jpg', 'sampleData/folder1/folder11/folder111'),  ('image1112.jpg', 'sampleData/folder1/folder11/folder111'),  ('image11111.jpg', 'sampleData/folder1/folder11/folder111/folder1111')....]

I have written a blog on processing large number of files and processing them

https://medium.com/@sreekiranar/directory-and-file-handling-in-python-for-real-world-projects-9bc8baf6ba89

You can refer that to get a fair idea on how to deal with getting all files from folders and subfolders.

Sreekiran A R
  • 3,123
  • 2
  • 20
  • 41