5

When I run this Python script in Windows, the process grows with no apparent end in sight:

import os

for i in xrange(1000000):
    for root, dirs, files in os.walk(r"c:\windows"):
        pass

Am I misunderstanding something? (I'm using Python 2.7.3.)

  • 3
    Well. `C:\windows` is a big graph to walk, right? Pretty deep. Very deep. How does process growing equates with the memory leak? – CppLearner Sep 21 '12 at 03:35
  • The process grows by about 7MB every time through the outer loop. I can understand using a lot of memory for one traversal, but shouldn't it be re-used after that? – user1687699 Sep 21 '12 at 04:02
  • @user1687669 You have to load things into memory. It may be the case that the garbage collector didn't clean it up right away because there is the concept of "reference count" in GC. If the reference count is not zero, it will remain in memory for some period of time. It is possible. One possible memory intense probably comes from the file system metadata. Every file / folder has metadata attached, like mode, user, date, etc. I will actually look into that because my project actually do stuff with file system so it's great you actually raised this question. – CppLearner Sep 21 '12 at 04:04
  • Running the same code, same Python version under Windows 7 64-bit, the process doesn't grow beyond 7MB, reverting to around 4MB each time the outer loop finishes. This is true for both the 32-bit & 64-bit versions of Python 2.7.3. Is that the exact script you're running? – Matthew Trevor Sep 21 '12 at 04:13
  • Yes, this is the exact script. I've tried it on two different machines running 32-bit Windows 7. I'm measuring the usage by the "Memory - private working set" column of task manager. I get the same results as "private bytes" in Process Explorer. – user1687699 Sep 21 '12 at 17:18
  • I've just replicated it on a third machine running 64-bit Windows 7 and 32-bit Python. – user1687699 Sep 21 '12 at 17:32
  • I let it run until it was out of memory: C:\Users\Eric\Documents>test.py Traceback (most recent call last): File "C:\Users\Eric\Documents\test.py", line 4, in for root, dirs, files in os.walk(r"c:\windows"): File "S:\Python27\lib\os.py", line 294, in walk for x in walk(new_path, topdown, onerror, followlinks): File "S:\Python27\lib\os.py", line 294, in walk for x in walk(new_path, topdown, onerror, followlinks): File "S:\Python27\lib\os.py", line 287, in walk nondirs.append(name) MemoryError – user1687699 Sep 21 '12 at 18:04
  • I begin to see what you mean, @user1687699 My previous assumption on data might be wrong. I wrote some test scripts. Let me do this on Windows. – CppLearner Sep 22 '12 at 03:45
  • ...Why are you walking the OS tree a million times, anyway? – Makoto Sep 30 '12 at 23:49

1 Answers1

5

This is due to memory leak found in os.path.isdir; see Huge memory leak in repeated os.path.isdir calls? You can test this yourself by using a Unicode-encoded path string - there should be no leak.

os.path.isdir is used in os.walk implementation:

    islink, join, isdir = path.islink, path.join, path.isdir
    try:
        names = listdir(top)
    except error, err:
        if onerror is not None:
            onerror(err)
        return

    dirs, nondirs = [], []
    for name in names:
        if isdir(join(top, name)):
            dirs.append(name)
        else:
            nondirs.append(name)
Community
  • 1
  • 1
AAlon
  • 434
  • 3
  • 8