A simple way to solve this is to query the file-system directly instead of caching all the filenames in a dict
. This will save a lot of space, and will probably be fast enough if there only a few hundred directories to search.
Here is a simple function that does that:
def find_directory(filename, directories):
for directory in directories:
path = os.path.join(directory, filename)
if os.path.exists(path):
return directory
On my Linux system, when searching around 170 directories, it takes about 0.3 seconds to do the first search, and then only about 0.002 seconds thereafter. This is because the OS does file-caching to speed up repeated searches. But note that if you used a dict
to do this caching in Python, you'd still have to pay a similar initial cost.
Of course, the subsequent dict
lookups would be faster than querying the file-system directly. But do you really need that extra speed? To me, two thousandths of second seems easily "fast enough" for most purposes. And you get the extra benefit of never needing to refresh the file-cache (because the OS does it for you).
PS:
I should probably point out that the above timings are worst-case: that is, I dropped all the system file-caches first, and then searched for a filename that was in the last directory.