10

I need to get the first appearance of the repository.config files in a directory and stop looking in the subdirectories.

Here is my directory tree:

./WAS80/base/disk1/ad/repository.config
./WAS80/base/disk1/md/repository.config
./WAS80/base/disk2/ad/repository.config
./WAS80/base/disk3/ad/repository.config
./WAS80/base/disk4/ad/repository.config
./WAS80/base/repository.config
./WAS80/fixpack/fp5/repository.config
./WAS80/fixpack_suplements/fp5/repository.config
./WAS80/supplements/disk1/ad/repository.config
./WAS80/supplements/disk1/md/repository.config
./WAS80/supplements/disk2/ad/repository.config
./WAS80/supplements/disk3/ad/repository.config
./WAS80/supplements/disk4/ad/repository.config
./WAS80/supplements/repository.config

I need the ones in bold and stop looking in the subdirectories.

I started tinkering with this code, but I couldn't figure it out.

pattern='repository.config'
path='/opt/was_binaries'

    def find_all(name, path):
            result = []
            for root, dirs, files in os.walk(path):
                    if name in files:
                            result.append(os.path.join(root, name))
                            continue

            return result
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
radicaled
  • 2,369
  • 5
  • 30
  • 44
  • For one thing indent your code properly. Python is very space-sensitive. For another, don't start your paragraphs with "so basically". – Mad Physicist Apr 25 '17 at 18:57
  • What does "I started tinkering with this code, but I couldn't figure it out." mean? What does it do that you don't like or understand. Please be specific. Do not reply in the comments! Edit your question to be a stand-alone problem statement please. – Mad Physicist Apr 25 '17 at 18:59

2 Answers2

20

this should do what you want:

import os

res = []

for here, dirs, files in os.walk(startdir, topdown=True):
    if 'repository.config' in files:
        res.append(os.path.join(here, 'repository.config'))
        dirs[:] = []
        # dirs.clear()  # should also work - not tested...

print(res)

whenever you encounter a 'repository.config' file, set dirs to [] in order to prevent os.walk from descending further into that directory tree.

note: it is vital for this to work to change the dirs in-place (i.e. dirs[:] = []) as opposed to rebind it (dirs = []).,

hiro protagonist
  • 44,693
  • 14
  • 86
  • 111
  • mindblowing... Can you explain how this results in some kind of feedback to the os.walk (generator?) operation so that it will skip further scanning? – Siete Jan 25 '23 at 09:47
  • Is this also possible if you are looking for a certain directory (ignoring the sub directories) instead of a file? How would that change the answer? – Siete Jan 25 '23 at 09:58
  • 1
    @Siete `os.walk` returns a reference to the `dirs` it will scan internally. all i am doing is change that (reference to a) list in-place - which in turn changes the list that is referenced inside `os.walk`. and sure: you can filter that depending on what you like. the important part is to reassign the `dirs` with `dirs[:] = ...`. – hiro protagonist Jan 25 '23 at 15:41
2

First, you have to make sure that topdown is set to True (this is default) so parent directories are scanned before child directories.

Create an existing set() to remember which directories you traversed when successfully found a config file.

Then, when you find your filename in the list:

  • check if the directory of the file isn't a child of a directory you registered
  • if it's not, just note down the path of the file in existing (add os.sep, so you don't match substrings of directories starting with the current dirname at the same level: ex: path\to\dir2 should be scanned even if path\to\dir is already in the set. But path\to\dir\subdir will be successfully filtered out).

code:

import os

existing = set()
for root,dirs,files in os.walk(path,topdown=True):
    if any(root.startswith(r) for r in existing):
        # current directory is longest and contains a previously added directory: skip
        continue
    if "repository.config" in files:
        # ok, we note down root dir (+ os.sep to avoid filtering siblings) and print the result
        existing.add(root+os.sep)
        print(os.path.join(root,"repository.config"))
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219