5

Is there any built in functions to find all the files under a particular directory including files under subdirectories ? I have tried this code, but not working...may be the logic itself is wrong...

def fun(mydir):
    lis=glob.glob(mydir)
    length=len(lis)
    l,i=0,0
    if len(lis):
        while(l+i<length):
            if os.path.isfile(lis[i]):
                final.append(lis[i])
                lis.pop(i)
                l=l+1
                i=i+1
            else:
                i=i+1
            print final
        fun(lis)
    else:
        print final
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
pythBegin
  • 75
  • 1
  • 4

3 Answers3

14

There is no built-in function, but using os.walk it's trivial to construct it:

import os
def recursive_file_gen(mydir):
    for root, dirs, files in os.walk(mydir):
        for file in files:
            yield os.path.join(root, file)

ETA: the os.walk function walks directory tree recursively; the recursive_file_gen function is a generator (uses yield keyword to produce next file). To get the resulting list do:

list(recursive_file_gen(mydir))
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
  • @pythbegin: added explanation, do ask if any specific point is not clear. – SilentGhost May 19 '10 at 13:16
  • @pyth: there is a [formal definition in Python docs](http://docs.python.org/reference/simple_stmts.html#the-yield-statement). – SilentGhost May 19 '10 at 13:26
  • ok... I made some changes to your code and it is like this now def listall(parent): lis=[] for root, dirs, files in os.walk(parent): for name in files: if os.path.getsize(os.path.join(root,name))>500000: lis.append(os.path.join(root,name)) return lis My aim is to find all the files with size greater than 500000...and it is working properly... But when I used this function on 'Temporary Internet Files' folder in Windows am getting this error... I think its because of the special characters in the file name. Can u suggest something ? – pythBegin May 19 '10 at 13:42
  • sorry...I forgot to mention the error Traceback (most recent call last): File "", line 1, in listall(a) File "", line 5, in listall if os.path.getsize(os.path.join(root,name))>500000: File "C:\Python26\lib\genericpath.py", line 49, in getsize return os.stat(filename).st_size WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\Documents and Settings\\khedarnatha\\Local Settings\\Temporary Internet Files\\Content.IE5\\EDS8C2V7\\??????+1[1].jpg' This is it – pythBegin May 19 '10 at 13:43
  • @pyth: I suspect it has to do with the encoding of the file name. It's hard to say, since you don't provide sample of the file names. Clearly, `?` cannot be present in the filename, since it's invalid. Try to see what was the actual name of the file. What did `os.walk` returned and what `os.path.join` returned. I'd suggest you ask separate question, as it is beyond limits of this one. – SilentGhost May 19 '10 at 13:57
  • It't not recommended to use _file_ as a variable name, as it shadows built-in name _file_. However, in this case, it doesn't make any difference. Used your snipped! Thank you. +1 – Qlimax Jul 30 '15 at 13:51
3

I highly recommend this path module, written by Jason Orendorff:

http://pypi.python.org/pypi/path.py/2.2

Unfortunately, his website is down now, but you can still download from the above link (or through easy_install, if you prefer).

Using this path module, you can do various actions on paths, including the walking files you requested. Here's an example:

from path import path

my_path = path('.')

for file in my_path.walkfiles():
    print file

for file in my_path.walkfiles('*.pdf'):
    print file

There are also convenience functions for many other things to do with paths:

In [1]: from path import path

In [2]: my_dir = path('my_dir')

In [3]: my_file = path('readme.txt')

In [5]: print my_dir / my_file
my_dir/readme.txt

In [6]: joined_path = my_dir / my_file

In [7]: print joined_path
my_dir/readme.txt

In [8]: print joined_path.parent
my_dir

In [9]: print joined_path.name
readme.txt

In [10]: print joined_path.namebase
readme

In [11]: print joined_path.ext
.txt

In [12]: joined_path.copy('some_output_path.txt')

In [13]: print path('some_output_path.txt').isfile()
True

In [14]: print path('some_output_path.txt').isdir()
False

There are more operations that can be done too, but these are some of the ones that I use most often. Notice that the path class inherits from string, so it can be used wherever a string is used. Also, notice that two or more path objects can easily be joined together by using the overridden / operator.

Hope this helps!

naitsirhc
  • 5,274
  • 2
  • 23
  • 16
2

os.walk() is what you need.

But for added performance, try the package scandir. It also part of the standard library in Python 3.5 and is described in PEP 471

Marco Mariani
  • 13,556
  • 6
  • 39
  • 55