1

I have a flat directory with large amount of files:

myFolder:
| 000001.csv
| 000002.csv
| 000003.csv
...
| 100000.csv

I need to read them in alphanumeric order and process them. Normal way to do it would be:

files = sorted(os.listdir(json_event_dir))
for file in files:
    with open(file) as f:
        process(file)

But I don't want to store files. Is there a way to make a generator of ordered files in a directory?

Kamil Saitov
  • 175
  • 1
  • 13
  • 3
    I don't see how else it could be done. It's not going to be possible to sort them without inspecting the individual names of _every_ file at some point. Unless the directory is utterly gigantic, I'm not sure what you gain from this? – roganjosh Apr 22 '23 at 11:48
  • 1
    This would be rather inefficient because the directory would have to be listed multiple times to find the next file in order. – Michael Butscher Apr 22 '23 at 11:51
  • 1
    By the way, on Windows the files returned by `listdir` would already be sorted. – Booboo Apr 22 '23 at 11:55
  • Is this for a specific operating system or does it need to be portable? – DarkKnight Apr 22 '23 at 13:40
  • 2
    If the objection is to storing a list of files in memory until you are done, even using `ls` (as suggested in multiple answers) requires they be stored in memory *somewhere*, (just not in your script's address space), unless you have some weird `ls` implementation that uses a disk-based sorting algorithm using O(1) memory. – chepner Apr 22 '23 at 15:24

1 Answers1

0

If you don't want to store it in the memory, store it in the file system. Then use open and you'll get each name lazily.

To have a sorted names into a file in Unix like operating systems:

ls <path to directory> | sort > sorted_names.txt

And then:

with open("<path to directory>/sorted_filenames.txt") as f:
    for filename in f:
        # Do the work here

After the work is done, you need to manage what you want to do with this file. You can delete it or if it's not going to change, use it for the next time. If it's temporary you can use the tempfile package.

S.B
  • 13,077
  • 10
  • 22
  • 49