0

I need to compare two directory structures to validate that all required files are present.

I am trying to compare two directories to see that they contain required files for downstream processing even though folder names inside those directories will be different.

An analogous example would be if these directories were parts of cars. Every car (represented by a directory) must have 4 wheels and an engine.

What matters is not the names of the directories, but that all files are present (i.e each car has exactly 4 wheels and an engine).

I am planning on using one correctly structured directory as a template to validate against.

C:.
├───Body_of_car
│   └───Wheels_of_car
│           Wheel1_michelin.txt
│           Wheel2_michelin.txt
|           Wheel3_michelin.txt
|           Wheel4_michelin.txt
│
└───Engine_of_car
        car_engine.txt

For example, the following two directories should represent a valid car based on their structure:

C:.
├───Body_of_honda
│   └───Wheels_of_honda
│           Wheel1_michelin.txt
│           Wheel2_michelin.txt
|           Wheel3_michelin.txt
|           Wheel4_michelin.txt
│
└───Engine_of_honda
        Honda_v6.txt

vs.

C:.
├───Body_of_toyota
│   └───Wheels_of_toyota
│           Wheel1_dunlap.txt
│           Wheel2_dunlap.txt
|           Wheel3_dunlap.txt
|           Wheel4_dunlap.txt
│
└───Engine_of_toyota
        Toyota_v4.txt

However the following directory structure should NOT result in a valid car (because it is missing a wheel):

C:.
├───Body_of_toyota
│   └───Wheels_of_toyota
│           Wheel1_dunlap.txt
│           Wheel2_dunlap.txt
|           Wheel3_dunlap.txt
│
└───Engine_of_toyota
        Toyota_v4.txt

I am able to parse the structure of a given directory into a dictionary using the following:

def path_to_dict(path):
    d = {'name': os.path.basename(path)}
    if os.path.isdir(path):
        d['type'] = "directory"
        d['children'] = [path_to_dict(os.path.join(path,x)) for x in os.listdir(path)]
    else:
        d['type'] = "file"
    return d

Where path is the path to the directory in question.

Then I can convert that dict to json using json.loads(). I figure I should be able to compare json vs. json. The issue with the methods I am aware of is that the names of the elements will not be the same - resulting in a mismatch even when all required files were actually present. For example the json will not match because "Body_of_car" != "Body_of_toyota" even though its structure represents a valid car and should pass.

The objective is to be able to compare directory structures agnostically of the names of folders/files to validate that contents are present.

Any help is greatly appreciated!

SquatLicense
  • 188
  • 8
  • You have to indeed be careful with the terminology here. [JSON Schema](https://json-schema.org/) and [JSON](https://www.json.org/json-en.html) are, although connected, two different things. What you are talking about here is JSON. But that being said you should not use a JSON file to compare the structure of your directories. This would work, but it's very slow as you already have all the information in main memory just to write it to a file and then read it again. That's not efficient at all. You should instead build a simple tree structure or use a dictionary to compare the files. – Mushroomator Mar 28 '22 at 23:06
  • In Python you could also use `os.walk()` as shown in [this thread](https://stackoverflow.com/questions/58174708/how-to-create-a-os-walk-function-which-compares-the-folders-and-subfolders-of). This is effectively a duplicate to this question. – Mushroomator Mar 28 '22 at 23:08
  • Does this answer your question? [How to create a os.walk() function which compares the folders and subfolders of two directories?](https://stackoverflow.com/questions/58174708/how-to-create-a-os-walk-function-which-compares-the-folders-and-subfolders-of) – Mushroomator Mar 28 '22 at 23:10
  • It is close, but the real root of the issue is that I cant compare folder names. Two folders with the same information will fail incorrectly if I use the methods described in that post. I would be happy to abandon json altogether and just compare dicts, however I believe I run into the same issue there with key names not matching. – SquatLicense Mar 28 '22 at 23:17
  • For which of the directories does it fail? `res1 = [r[1:] for r in os.walk("myotherdir")] res2 = [r[1:] for r in os.walk("mydir")] print(res1 == res2)` this certainly works for me. – Mushroomator Mar 28 '22 at 23:21

0 Answers0