0

I wrote the following code:

import fnmatch
ll = []
for items in src:
    for file in os.listdir('/Users/swaghccc/Downloads/PulledImages/'):
        if fnmatch.fnmatch(file, items.split('/')[-1]):
            print file
            ll.append(file)

my src list contains paths to images. something like:

/path/to/image.jpg

These images are a subset of the images contained in the directory PulledImages.

The printing of the matched images works correctly. But when I try to put those imagesnames into a list ll it takes forever.

What on earth am I doing wrong?

TheTank
  • 495
  • 2
  • 9
  • 21
  • That's frankly hard to believe. `print` is a **much** more expensive operation than `list.append()`, by orders of magnitude. Can you provide a self-contained [mcve] which allows others to reproduce your result? – Charles Duffy Mar 07 '18 at 19:14
  • why fnmatch ? shouldnt `if items.endswith('/'+file):` suffice? why item**S** ? you do not use any of fnmatch'es features... – Patrick Artner Mar 07 '18 at 19:16
  • Hmm. I wonder if the OP is trying to avoid appending things more than once? They should be using a data structure that allows O(1) lookups, if so. – Charles Duffy Mar 07 '18 at 19:17
  • Even if that were true, though, it's not the append that would be slow, but the other code they're using to gate it. – Charles Duffy Mar 07 '18 at 19:19
  • 1
    (and why in the world would you run `os.listdir()` in a loop, rather than running it once and caching its results in a structure you can refer to with an O(1) lookup?) – Charles Duffy Mar 07 '18 at 19:19

1 Answers1

3

Appending doesn't take forever. Searching through a list, however, takes more time the longer your list is; and os.listdir(), being an operating system call, can be unavoidably slow when running against a large directory.

To avoid that, use a dictionary or set, not a list, to track the names you want to compare against -- and build that set only once, outside your loop.

# run os.listdir only once, storing results in a set for constant-time lookup
import sets
files = sets.Set(os.listdir('/Users/swaghccc/Downloads/PulledImages/'))

ll = []
for item in src:
    if item.split('/')[-1] in files:
        ll.append(file)

Community Wiki because I don't believe this question to be within topic guidelines without a MCVE; thus, not taking rep/credit for this answer.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Switching from list to set in my case took the time down from 6 minutes to less than 1 second. – Sameh Nov 26 '19 at 20:02