7

Okay I'm having trouble not only with the problem itself but even with trying to explain my question. I have a directory tree consisting of about 7 iterations, so: rootdir/a/b/c/d/e/f/destinationdir

The thing is some may have 5 subdirectory levels and some may have as many as ten, such as:

rootdir/a/b/c/d/destinationdir

or:

rootdir/a/b/c/d/e/f/g/h/destinationdir

The only thing they have in common is that the destination directory is always named the same thing. The way I'm using the glob function is as follows:

for path in glob.glob('/rootdir/*/*/*/*/*/*/destinationdir'):
--- os.system('cd {0}; do whatever'.format(path))

However, this only works for the directories with that precise number of intermediate subdirectories. Is there any way for me not to have to specify that number of subdirectories(asterices); in other words having the function arrive at the destinationdir no matter what the number of intermediate subdirectories is, and allowing me to iterate through them. Thanks a lot!

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504

5 Answers5

5

I think this could be done more easily with os.walk:

def find_files(root,filename):
    for directory,subdirs,files in os.walk(root):
        if filename in files:
            yield os.join(root,directory,filename)

Of course, this doesn't allow you to have a glob expression in the filename portion, but you could check that stuff using regex or fnmatch.

EDIT

Or to find a directory:

def find_files(root,d):
    for directory,subdirs,files in os.walk(root):
        if d in subdirs:
            yield os.join(root,directory,d)
mgilson
  • 300,191
  • 65
  • 633
  • 696
4

You can create a pattern for each level of indentation (increase 10 if needed):

for i in xrange(10):
    pattern = '/rootdir/' + ('*/' * i) + 'destinationdir'
    for path in glob.glob(pattern):
        os.system('cd {0}; do whatever'.format(path))

This will iterate over:

'/rootdir/destinationdir'
'/rootdir/*/destinationdir'
'/rootdir/*/*/destinationdir'
'/rootdir/*/*/*/destinationdir'
'/rootdir/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/*/*/destinationdir'
'/rootdir/*/*/*/*/*/*/*/*/*/destinationdir'

If you have to iterate over directories with arbitrary depth then I suggest dividing the algorithm in two steps: one phase where you investigate where all 'destinationdir' directories are located and a second phase where you perform your operations.

Simeon Visser
  • 118,920
  • 18
  • 185
  • 180
3

Python 3 glob.glob now accepts double wildcards to designate any number of intermediate directories, as long as you also pass recursive=True:

>>> import glob
>>> glob.glob('**/*.txt', recursive=True)
['1.txt', 'foo/2.txt', 'foo/bar/3.txt', 'foo/bar/baz/4.txt']
Mark Amery
  • 143,130
  • 81
  • 406
  • 459
Tosha
  • 998
  • 1
  • 10
  • 22
  • but be aware that the '**' glob pattern does not follow symlinks as it could end up in endless loops. – MrE Oct 06 '18 at 20:45
2

If you are looking for files, you can use the Formic package (disclosure: I wrote it) - this implements Apache Ant's FileSet Globs with the '**' wildcard:

import formic
fileset = formic.FileSet(include="rootdir/**/destinationdir/*")

for file_name in fileset:
    # Do something with file_name
Andrew Alcock
  • 19,401
  • 4
  • 42
  • 60
0

This looks much easier to accomplish with a more versatile tool, like the find command (your os.system call indicates you're on a unix-like system, so this will work).

os.system('find /rootdir -mindepth 5 -maxdepth 10 -type d -name destinationdir | while read d; do ( cd $d && do whatever; ); done')

..Note that if you are going to put any user-supplied string into that command, this becomes drastically unsafe, and you should use subprocess.Popen instead, executing the shell and splitting the arguments yourself. It's safe as shown, though.

the paul
  • 8,972
  • 1
  • 36
  • 53