I've been scripting something that has to do with scanning directories and noticed a severe memory leak when calling os.path.isdir, so I've tried the following snippet:
def func():
if not os.path.isdir('D:\Downloads'):
return False
while True:
func()
Within a few seconds, the Python process reached 100MB RAM.
I'm trying to figure out what's going on. It seems like the huge memory leak is in effect only when the path is indeed a valid directory path (meaning the 'return False' is not executed). Also, it is interesting to see what happens in related calls, like os.path.isfile.
Thoughts?
Edit: I think I'm onto something. Although isfile and isdir are implemented in the genericpath module, on Windows system - isdir is being imported from the builtin nt. So I had to download the 2.7.3 source (which I should've done long time ago...).
After a little bit of searching, I found out posix__isdir function in \Modules\posixmodule.c, which I assume is the 'isdir' function imported from nt.
This part of the function (and comment) caught my eye:
if (PyArg_ParseTuple(args, "U|:_isdir", &po)) {
Py_UNICODE *wpath = PyUnicode_AS_UNICODE(po);
attributes = GetFileAttributesW(wpath);
if (attributes == INVALID_FILE_ATTRIBUTES)
Py_RETURN_FALSE;
goto check;
}
/* Drop the argument parsing error as narrow strings
are also valid. */
PyErr_Clear();
It seems that it all boils down to Unicode/ASCII handling bug.
I've just tried my snippet above with path argument in unicode (i.e. u'D:\Downloads') - no memory leak whatsoever. haha.