3

I'm using Python 3.3.

If I'm manipulating potentially infinite files in a directory (bear with me; just pretend I have a filesystem that supports that), how do I do that without encountering a MemoryError? I only want the string name of one file to be in memory at a time. I don't want them all in an iterable as that would cause a memory error when there are too many.

Will os.walk() work just fine, since it returns a generator? Or, do generators not work like that?

Is this possible?

Brōtsyorfuzthrāx
  • 4,387
  • 4
  • 34
  • 56
  • 4
    Yes, this is exactly why generators are used. – spinlok Apr 02 '14 at 23:53
  • To clarify, is the problem that `os.listdir()` returns a list, which cannot work on a directory with an infinite number of entries? If so, there is no built-in solution that I'm aware of. You need to write a custom C module (I'd recommend using CFFI). – Armin Rigo Apr 03 '14 at 11:21
  • Yes. The problem is that lists can't contain an infinite number of entries. Armin Rigo, are you refuting what spinlok said by saying you need to write a custom C module? Or am I not understanding? Thanks for your help, everyone. – Brōtsyorfuzthrāx Apr 16 '14 at 01:25

1 Answers1

1

If you have a system for naming the files that can be figured out computationally, you can do such as this (this iterates over any number of numbered txt files, with only one in memory at a time; you could convert to another calculable system to get shorter filenames for large numbers):

import os

def infinite_files(path):
    num=0;
    while 1:
        if not os.path.exists(os.path.join(path, str(num)+".txt")):
            break
        else:
            num+=1 #perform operations on the file: str(num)+".txt"



[My old inapplicable answer is below]

glob.iglob seems to do exactly what the question asks for. [EDIT: It doesn't. It actually seems less efficient than listdir(), but see my alternative solution above.] From the official documentation:

glob.glob(pathname, *, recursive=False)
Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell).


glob.iglob(pathname, *, recursive=False)
Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

iglob returns an "iterator which yields" or-- more concisely-- a generator.

Since glob.iglob has the same behavior as glob.glob, you can search with wildcard characters:

import glob
for x glob.iglob("/home/me/Desktop/*.txt"):
    print(x) #prints all txt files in that directory

I don't see a way for it to differentiate between files and directories without doing it manually. That is certainly possible, however.

Z4-tier
  • 7,287
  • 3
  • 26
  • 42
Brōtsyorfuzthrāx
  • 4,387
  • 4
  • 34
  • 56
  • 2
    The glob module uses `os.listdir()` under the hood, so it has the same limitations. See http://hg.python.org/cpython/file/c0e311e010fc/Lib/glob.py – Ferdinand Beyer Sep 09 '14 at 06:00
  • After analyzing that code, you seem to be right. I will edit my answer to reflect that. I don't want to all-out delete it, because people need to know that it doesn't work for this, though. – Brōtsyorfuzthrāx Sep 09 '14 at 06:19
  • I added another solution at the top of my answer, seeing as the first one was based on an incorrect assumption. – Brōtsyorfuzthrāx Sep 09 '14 at 06:48