I need to compare two directory structures to validate that all required files are present.
I am trying to compare two directories to see that they contain required files for downstream processing even though folder names inside those directories will be different.
An analogous example would be if these directories were parts of cars. Every car (represented by a directory) must have 4 wheels and an engine.
What matters is not the names of the directories, but that all files are present (i.e each car has exactly 4 wheels and an engine).
I am planning on using one correctly structured directory as a template to validate against.
C:.
├───Body_of_car
│ └───Wheels_of_car
│ Wheel1_michelin.txt
│ Wheel2_michelin.txt
| Wheel3_michelin.txt
| Wheel4_michelin.txt
│
└───Engine_of_car
car_engine.txt
For example, the following two directories should represent a valid car based on their structure:
C:.
├───Body_of_honda
│ └───Wheels_of_honda
│ Wheel1_michelin.txt
│ Wheel2_michelin.txt
| Wheel3_michelin.txt
| Wheel4_michelin.txt
│
└───Engine_of_honda
Honda_v6.txt
vs.
C:.
├───Body_of_toyota
│ └───Wheels_of_toyota
│ Wheel1_dunlap.txt
│ Wheel2_dunlap.txt
| Wheel3_dunlap.txt
| Wheel4_dunlap.txt
│
└───Engine_of_toyota
Toyota_v4.txt
However the following directory structure should NOT result in a valid car (because it is missing a wheel):
C:.
├───Body_of_toyota
│ └───Wheels_of_toyota
│ Wheel1_dunlap.txt
│ Wheel2_dunlap.txt
| Wheel3_dunlap.txt
│
└───Engine_of_toyota
Toyota_v4.txt
I am able to parse the structure of a given directory into a dictionary using the following:
def path_to_dict(path):
d = {'name': os.path.basename(path)}
if os.path.isdir(path):
d['type'] = "directory"
d['children'] = [path_to_dict(os.path.join(path,x)) for x in os.listdir(path)]
else:
d['type'] = "file"
return d
Where path is the path to the directory in question.
Then I can convert that dict to json using json.loads(). I figure I should be able to compare json vs. json. The issue with the methods I am aware of is that the names of the elements will not be the same - resulting in a mismatch even when all required files were actually present. For example the json will not match because "Body_of_car" != "Body_of_toyota" even though its structure represents a valid car and should pass.
The objective is to be able to compare directory structures agnostically of the names of folders/files to validate that contents are present.
Any help is greatly appreciated!