8

On a mac in python 2.7 when walking through directories using os.walk my script goes through 'apps' i.e. appname.app, since those are really just directories of themselves. Well later on in processing I am hitting errors when going through them. I don't want to go through them anyways so for my purposes it would be best just to ignore those types of 'directories'.

So this is my current solution:

for root, subdirs, files in os.walk(directory, True):
    for subdir in subdirs:
        if '.' in subdir:
            subdirs.remove(subdir)
    #do more stuff

As you can see, the second for loop will run for every iteration of subdirs, which is unnecessary since the first pass removes everything I want to remove anyways.

There must be a more efficient way to do this. Any ideas?

Patrick Bateman
  • 271
  • 2
  • 5
  • 14
  • 5
    For those unaware of this feature: removing a directory from the `subdirs` list returned by `os.walk` causes `os.walk` not to recurse into that directory. – interjay May 16 '12 at 14:45
  • 1
    The way os.walk works you won't iterate into the subdirectories that you remove from the list, so I don't understand why you're concerned. – Mark Ransom May 16 '12 at 14:50
  • @interjay exactly that! which is why I don't think bottom-up would work for me. What I am doing in my example is exactly what I want to do as Mark Ransom is stating; I'm just asking if there is a more efficient way to do this since the for loop will be repeated for each of the valid subdirectories I will be iterating through; to me this seems inefficient, albeit not much of a performance hit anyways. My question really wraps around what a best practice would look like. Is this it? – Patrick Bateman May 17 '12 at 17:55
  • @Mark Ransom, I am only concerned with the fact that the second for loop would be go through on each iteration of valid subdirs. – Patrick Bateman May 17 '12 at 17:56

3 Answers3

20

You can do something like this (assuming you want to ignore directories containing '.'):

subdirs[:] = [d for d in subdirs if '.' not in d]

The slice assignment (rather than just subdirs = ...) is necessary because you need to modify the same list that os.walk is using, not create a new one.

Note that your original code is incorrect because you modify the list while iterating over it, which is not allowed.

interjay
  • 107,303
  • 21
  • 270
  • 254
1

Perhaps this example from the Python docs for os.walk will be helpful. It works from the bottom up (deleting).

# Delete everything reachable from the directory named in "top",
# assuming there are no symbolic links.
# CAUTION:  This is dangerous!  For example, if top == '/', it
# could delete all your disk files.
import os
for root, dirs, files in os.walk(top, topdown=False):
    for name in files:
        os.remove(os.path.join(root, name))
    for name in dirs:
        os.rmdir(os.path.join(root, name))

I am a bit confused about your goal, are you trying to remove a directory subtree and are encountering errors, or are you trying to walk a tree and just trying to list simple file names (excluding directory names)?

Levon
  • 138,105
  • 33
  • 200
  • 191
  • The question is about not traversing into certain directories, not about deleting them. – interjay May 16 '12 at 14:33
  • @interjay Oops .. I got taken by the "removing subdirectories" in the question, I'll re-read and update the question as needed. Thanks for pointing this out. – Levon May 16 '12 at 14:35
  • @Levon I'm not encountering an error removing anything. Actually what I am doing is building a syncing app (for my own personal learning, i know programs exist that does that already) to take data from a target and copy it to a source. I am getting issues during the copy saying the file doesn't exist, even though I already went through the process of finding the files before hand. This is only happening when there are folders the process goes through with a .app extension (MAC). So, since I don't want to copy those anyways, I thought just removing those types of folders would be best. – Patrick Bateman May 17 '12 at 18:28
0

I think all that is required is to remove the directory before iterating over it:

for root, subdirs, files in os.walk(directory, True):
        if '.' in subdirs:
            subdirs.remove('.')
        for subdir in subdirs:
            #do more stuff
gerardw
  • 5,822
  • 46
  • 39