What is the Python way to walk a directory tree?

Question

I feel that assigning files, and folders and doing the += [item] part is a bit hackish. Any suggestions? I'm using Python 3.2

from os import *
from os.path import *

def dir_contents(path):
    contents = listdir(path)
    files = []
    folders = []
    for i, item in enumerate(contents):
        if isfile(contents[i]):
            files += [item]
        elif isdir(contents[i]):
            folders += [item]
    return files, folders

Avoid `from x import *`. *That's* one piece of advice for Pythonic style. — Chris Morgan, Jul 10 '11 at 05:47
This way of adding items to a list is hackish too. Add a single item with `files.append(item)` or multiple items with `files.extend([item1, item2, ...])` — Ronan Paixão, Apr 26 '21 at 18:45

monkut · Answer 1 · 2022-11-09T02:15:06.917

51

os.walk and os.scandir are great options, however, I've been using pathlib more and more, and with pathlib you can use the .glob() or .rglob() (recursive glob) methods:

root_directory = Path(".")
for path_object in root_directory.rglob('*'):
    if path_object.is_file():
        print(f"hi, I'm a file: {path_object}")
    elif path_object.is_dir():
        print(f"hi, I'm a dir: {path_object}")

edited Nov 09 '22 at 02:15

answered Sep 09 '20 at 02:23

monkut

42,176
24
124
155

6

However, os.walk separates the files and the dirs for you already. Also, just remembered: with os.walk, if I set topdown True (default), I can manipulate the subdirs list, and, for example, skip whole subtrees. See the note about ** in large trees in the docs. I wish os.walk could return Path objects. (Stupid 5 minute edit limit) – Jürgen A. Erhard Oct 21 '20 at 05:37
1

You can replace `glob('**/*')` with `rglob('*')` which looks nicer. – Paul Aug 28 '22 at 13:59

score 44 · Accepted Answer · edited Jul 10 '11 at 05:46

44

Take a look at the os.walk function which returns the path along with the directories and files it contains. That should considerably shorten your solution.

edited Jul 10 '11 at 05:46

johnsyweb

136,902
23
188
247

answered Jul 10 '11 at 05:34

Sanjay T. Sharma

22,857
4
59
71

1

Wow that's perfect, cant believe i missed it. Thanks you. – Mike Jul 10 '11 at 05:40
3

but `os.walk` isn't limited to one directory level like the OP's code is. – Dan D. Jul 10 '11 at 05:47

score 37 · Answer 3 · edited Mar 04 '22 at 20:05

37

For anyone looking for a solution using pathlib (python >= 3.4)

from pathlib import Path

def walk(path): 
    for p in Path(path).iterdir(): 
        if p.is_dir(): 
            yield from walk(p)
            continue
        yield p.resolve()

# recursively traverse all files from current directory
for p in walk(Path('.')): 
    print(p)

# the function returns a generator so if you need a list you need to build one
all_files = list(walk(Path('.')))

However, as mentioned above, this does not preserve the top-down ordering given by os.walk

edited Mar 04 '22 at 20:05

Flimm

136,138
45
251
267

answered Nov 19 '20 at 16:32

user3645016

603
5
7

5

I don't think I'd ever seen that `yield from` syntax before, or at least I'd forgotten about it. Thanks for illustrating it here! Relevant docs for posterity: https://docs.python.org/3/whatsnew/3.3.html#pep-380 – David Marx Sep 11 '21 at 17:24
Note that the way this code is implemented means that only files will be listed, not directories. – Flimm Mar 04 '22 at 20:06
I don't think the `continue` statement is needed; I get an identical result without it. – AllanLRH Oct 07 '22 at 09:14
If you exclude the `continue` statement then it will also yield the directories. Otherwise you only get files. So it depends on what you want – user3645016 Oct 28 '22 at 07:56

score 9 · Answer 4 · answered Feb 22 '22 at 15:35

9

Since Python >= 3.4 the exists the generator method Path.rglob. So, to process all paths under some/starting/path just do something such as

from pathlib import Path

path = Path('some/starting/path') 
for subpath in path.rglob('*'):
    # do something with subpath

To get all subpaths in a list do list(path.rglob('*')). To get just the files with sql extension, do list(path.rglob('*.sql')).

answered Feb 22 '22 at 15:35

Mateo

1,494
1
18
27

I'll be using this everywhere from now on. Pity the Python devs didn't default the first argument to '*' then it could be even shorter :) Also with an empty string passed to rglob you seem to get the directories only if that's what you need. – Keeely Jun 21 '22 at 09:11

Gijs · Answer 5 · 2017-05-29T21:18:44.673

4

If you want to recursively iterate through all the files, including all files in the subfolders, I believe this is the best way.

import os

def get_files(input):
    for fd, subfds, fns in os.walk(input):
       for fn in fns:
            yield os.path.join(fd, fn)

## now this will print all full paths

for fn in get_files(fd):
    print(fn)

edited May 29 '17 at 21:18

answered Jun 13 '16 at 09:14

Gijs

10,346
5
27
38

3

I really like this approach because it separates the file system iteration code from the code to process each file! However, the "yield from" line needs to be omitted — `os.walk` already walks into subdirectories, so if you do it too, you see subdirectory files 2^n times. – Alex Martini May 29 '17 at 17:06

score 4 · Answer 6 · edited Mar 04 '22 at 20:02

4

Since Python 3.4 there is new module pathlib. So to get all dirs and files one can do:

from pathlib import Path

dirs = [str(item) for item in Path(path).iterdir() if item.is_dir()]
files = [str(item) for item in Path(path).iterdir() if item.is_file()]

edited Mar 04 '22 at 20:02

Flimm

136,138
45
251
267

answered Dec 07 '17 at 10:10

Radek Lonka

107
3

14

iterdir() does not walk a tree recursively. – Brian Jul 15 '19 at 02:59
4

But... pathlib does support recursive globbing. – kojiro Apr 07 '20 at 21:14
1

The method `iterdir()` [does not guarantee](https://docs.python.org/3.7/library/pathlib.html#pathlib.Path.iterdir) the `os.walk()` [top-down ordering](https://docs.python.org/3/library/os.html#os.walk). I would be extremely reticent to attempt to reimplement that tried-and-tested method. (**NOTE:** Some methods, like `os.rmdir()` can only delete an empty directory, so order can be very important.) – ingyhere Apr 29 '20 at 02:49

score 4 · Answer 7 · answered Jun 27 '22 at 09:17

Another solution how to walk a directory tree using the pathlib module:

from pathlib import Path

for directory in Path('.').glob('**'):
    for item in directory.iterdir():
        print(item)

The pattern ** matches current directory and all subdirectories, recursively, and the method iterdir then iterates over each directory's contents. Useful when you need more control when traversing the directory tree.

score 3 · Answer 8 · answered Jul 10 '11 at 05:42

3

def dir_contents(path):
    files,folders = [],[]
    for p in listdir(path):
        if isfile(p): files.append(p)
        else: folders.append(p)
    return files, folders

answered Jul 10 '11 at 05:42

pylover

7,670
8
51
73

score 3 · Answer 9 · answered Jul 10 '11 at 06:12

Indeed using

items += [item]

is bad for many reasons...

The append method has been made exactly for that (appending one element to the end of a list)
You are creating a temporary list of one element just to throw it away. While raw speed should not your first concern when using Python (otherwise you're using the wrong language) still wasting speed for no reason doesn't seem the right thing.
You are using a little asymmetry of the Python language... for list objects writing a += b is not the same as writing a = a + b because the former modifies the object in place, while the second instead allocates a new list and this can have a different semantic if the object a is also reachable using other ways. In your specific code this doesn't seem the case but it could become a problem later when someone else (or yourself in a few years, that is the same) will have to modify the code. Python even has a method extend with a less subtle syntax that is specifically made to handle the case in which you want to modify in place a list object by adding at the end the elements of another list.

Also as other have noted seems that your code is trying to do what os.walk already does...

mikebabcock · Answer 10 · 2020-08-18T15:24:34.663

Instead of the built-in os.walk and os.path.walk, I use something derived from this piece of code I found suggested elsewhere which I had originally linked to but have replaced with inlined source:

import os
import stat

class DirectoryStatWalker:
    # a forward iterator that traverses a directory tree, and
    # returns the filename and additional file information

    def __init__(self, directory):
        self.stack = [directory]
        self.files = []
        self.index = 0

    def __getitem__(self, index):
        while 1:
            try:
                file = self.files[self.index]
                self.index = self.index + 1
            except IndexError:
                # pop next directory from stack
                self.directory = self.stack.pop()
                self.files = os.listdir(self.directory)
                self.index = 0
            else:
                # got a filename
                fullname = os.path.join(self.directory, file)
                st = os.stat(fullname)
                mode = st[stat.ST_MODE]
                if stat.S_ISDIR(mode) and not stat.S_ISLNK(mode):
                    self.stack.append(fullname)
                return fullname, st

if __name__ == '__main__':
    for file, st in DirectoryStatWalker("/usr/include"):
        print file, st[stat.ST_SIZE]

It walks the directories recursively and is quite efficient and easy to read.

+1 @mikebabcock thanks - this works for me out-of-the-box in Python 2.x (even though the OP is using 3.x) I needed a 2.x solution. — therobyouknow, Feb 13 '12 at 15:32
Unfortunately that project is no longer available, 404. Could someone repaste it here? — LarsH, Jan 29 '13 at 14:48
I haven't checked if its identical yet, but cf http://pymoex.googlecode.com/svn/trunk/os_path/directoryStatWalker.py @LarsH — mikebabcock, Feb 04 '13 at 20:24

score 0 · Answer 11 · answered Feb 24 '15 at 07:52

While googling for the same info, I found this question.

I am posting here the smallest, clearest code which I found at http://www.pythoncentral.io/how-to-traverse-a-directory-tree-in-python-guide-to-os-walk/ (rather than just posting the URL, in case of link rot).

The page has some useful info and also points to a few other relevant pages.

# Import the os module, for the os.walk function
import os

# Set the directory you want to start from
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
    print('Found directory: %s' % dirName)
    for fname in fileList:
        print('\t%s' % fname)

score 0 · Answer 12 · answered Jul 18 '15 at 06:55

I've not tested this extensively yet, but I believe this will expand the os.walk generator, join dirnames to all the file paths, and flatten the resulting list; To give a straight up list of concrete files in your search path.

import itertools
import os

def find(input_path):
    return itertools.chain(
        *list(
            list(os.path.join(dirname, fname) for fname in files)
            for dirname, _, files in os.walk(input_path)
        )
    )

score 0 · Answer 13 · answered Jul 10 '11 at 05:37

0

Try using the append method.

answered Jul 10 '11 at 05:37

icktoofay

126,289
21
250
231

+1: this is also far better than `list += [item]`. The *batteries are inccluded* and familiarity with the core language features stops you from reinventing the battery: http://docs.python.org/tutorial/stdlib.html#batteries-included – msw Jul 10 '11 at 06:03

score 0 · Answer 14 · answered Mar 04 '23 at 11:22

0

import pathlib
import time

def prune_empty_dirs(path: pathlib.Path):
    for current_path in list(path.rglob("*"))[::-1]:
        if current_path.is_dir() and not any(current_path.iterdir()):
            current_path.rmdir()
            while current_path.exists():
                time.sleep(0.1)

answered Mar 04 '23 at 11:22

lucasfcnunes

1
1

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Blue Robin Mar 09 '23 at 04:30

score 0 · Answer 15 · answered Jul 21 '23 at 15:21

I like the structure of the result of os.walk() but prefer pathlib overall. My lazy solution therefore is simply creating a Path from each item returned by os.walk().

import os
import pathlib


def walk(path='bin'):
    for root, dirs, files in os.walk(path):
        root = pathlib.Path(root)
        dirs = [root / d for d in dirs]
        files = [root / f for f in files]
        yield root, dirs, files

score 0 · Answer 16 · answered Jul 21 '23 at 15:43

Here is a version that uses os.scandir and returns a tree structure. Using os.scandir will return os.DirEntry objects, which hold information about the path objects in memory, allowing querying of the information about the items without filesystem calls.

import os

def treedir(path):
    files = []
    folders = {}
    for entry in os.scandir(path):
        if entry.is_file():
            files.append(entry)
        elif entry.is_dir():
            folders[entry.name] = treedir(entry)
    result = {}
    if files:
        result['files'] = files
    if folders:
        result['folders'] = folders
    return result

NOZUONOHIGH · Answer 17 · 2023-08-23T10:20:29.337

Copy and paste code for those who want to deep walk all sub directories, we can use python recursion call:

import os

def deep_walk(mypath):
    file_list = []
    for root, dirs, files in os.walk(mypath):
        for file in files:
            file_list.append(os.path.join(root, file))
        for dir in dirs:
            if os.path.isdir(dir):
                mypath = os.path.join(root, dir)
                deep_walk(mypath)
    for f in file_list:
        print(f)

def main():
    mypath="/tmp"
    deep_walk(mypath)

if __name__ == '__main__':
    main()

What is the Python way to walk a directory tree?

17 Answers17

Linked

Related