In Python, how to find all the files under a directory, including the files in subdirectories?

Question

Is there any built in functions to find all the files under a particular directory including files under subdirectories ? I have tried this code, but not working...may be the logic itself is wrong...

def fun(mydir):
    lis=glob.glob(mydir)
    length=len(lis)
    l,i=0,0
    if len(lis):
        while(l+i<length):
            if os.path.isfile(lis[i]):
                final.append(lis[i])
                lis.pop(i)
                l=l+1
                i=i+1
            else:
                i=i+1
            print final
        fun(lis)
    else:
        print final

SilentGhost · Accepted Answer · 2010-05-19T13:15:43.300

14

There is no built-in function, but using os.walk it's trivial to construct it:

import os
def recursive_file_gen(mydir):
    for root, dirs, files in os.walk(mydir):
        for file in files:
            yield os.path.join(root, file)

ETA: the os.walk function walks directory tree recursively; the recursive_file_gen function is a generator (uses yield keyword to produce next file). To get the resulting list do:

list(recursive_file_gen(mydir))

edited May 19 '10 at 13:15

answered May 19 '10 at 12:15

SilentGhost

307,395
66
306
293

@pythbegin: added explanation, do ask if any specific point is not clear. – SilentGhost May 19 '10 at 13:16
@pyth: there is a [formal definition in Python docs](http://docs.python.org/reference/simple_stmts.html#the-yield-statement). – SilentGhost May 19 '10 at 13:26
ok... I made some changes to your code and it is like this now def listall(parent): lis=[] for root, dirs, files in os.walk(parent): for name in files: if os.path.getsize(os.path.join(root,name))>500000: lis.append(os.path.join(root,name)) return lis My aim is to find all the files with size greater than 500000...and it is working properly... But when I used this function on 'Temporary Internet Files' folder in Windows am getting this error... I think its because of the special characters in the file name. Can u suggest something ? – pythBegin May 19 '10 at 13:42
sorry...I forgot to mention the error Traceback (most recent call last): File "", line 1, in listall(a) File "", line 5, in listall if os.path.getsize(os.path.join(root,name))>500000: File "C:\Python26\lib\genericpath.py", line 49, in getsize return os.stat(filename).st_size WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\Documents and Settings\\khedarnatha\\Local Settings\\Temporary Internet Files\\Content.IE5\\EDS8C2V7\\??????+1[1].jpg' This is it – pythBegin May 19 '10 at 13:43
@pyth: I suspect it has to do with the encoding of the file name. It's hard to say, since you don't provide sample of the file names. Clearly, `?` cannot be present in the filename, since it's invalid. Try to see what was the actual name of the file. What did `os.walk` returned and what `os.path.join` returned. I'd suggest you ask separate question, as it is beyond limits of this one. – SilentGhost May 19 '10 at 13:57
It't not recommended to use _file_ as a variable name, as it shadows built-in name _file_. However, in this case, it doesn't make any difference. Used your snipped! Thank you. +1 – Qlimax Jul 30 '15 at 13:51

score 3 · Answer 2 · answered May 19 '10 at 15:59

I highly recommend this path module, written by Jason Orendorff:

http://pypi.python.org/pypi/path.py/2.2

Unfortunately, his website is down now, but you can still download from the above link (or through easy_install, if you prefer).

Using this path module, you can do various actions on paths, including the walking files you requested. Here's an example:

from path import path

my_path = path('.')

for file in my_path.walkfiles():
    print file

for file in my_path.walkfiles('*.pdf'):
    print file

There are also convenience functions for many other things to do with paths:

In [1]: from path import path

In [2]: my_dir = path('my_dir')

In [3]: my_file = path('readme.txt')

In [5]: print my_dir / my_file
my_dir/readme.txt

In [6]: joined_path = my_dir / my_file

In [7]: print joined_path
my_dir/readme.txt

In [8]: print joined_path.parent
my_dir

In [9]: print joined_path.name
readme.txt

In [10]: print joined_path.namebase
readme

In [11]: print joined_path.ext
.txt

In [12]: joined_path.copy('some_output_path.txt')

In [13]: print path('some_output_path.txt').isfile()
True

In [14]: print path('some_output_path.txt').isdir()
False

There are more operations that can be done too, but these are some of the ones that I use most often. Notice that the path class inherits from string, so it can be used wherever a string is used. Also, notice that two or more path objects can easily be joined together by using the overridden / operator.

Hope this helps!

Marco Mariani · Answer 3 · 2014-10-16T09:09:37.327

2

os.walk() is what you need.

But for added performance, try the package scandir. It also part of the standard library in Python 3.5 and is described in PEP 471

edited Oct 16 '14 at 09:09

answered May 19 '10 at 12:12

Marco Mariani

13,556
6
39
55

In Python, how to find all the files under a directory, including the files in subdirectories?

3 Answers3

Linked