I need to get last modified dates of all Folders and Files in DBFS mount point (of ADLS Gen1) under Azure Databricks. Folder structure is like:
Not containing any files, Empty folders:
/dbfs/mnt/ADLS1/LANDING/parent/child/subfolder1
/dbfs/mnt/ADLS1/LANDING/parent/child/subfolder2/subfolder3
Containing some files:
/dbfs/mnt/ADLS1/LANDING/parent/XYZ/subfolder4/File1.txt
/dbfs/mnt/ADLS1/LANDING/parent/XYZ/subfolder5/subfolder6/File2.txt
Used following Python code to get last modified date:
root_dir = "/dbfs/mnt/ADLS1/LANDING/parent"
def get_directories(root_dir):
for child in Path(root_dir).iterdir():
if child.is_file():
print(child, datetime.fromtimestamp(getmtime(child)).date())
else:
print(child, datetime.fromtimestamp(getmtime(child)).date())
get_directories(child)
From above code, I am getting correct modified date for all folders containing files.
But for empty folders, it is giving current date. Not last modified date.
Whereas, when I hardcode the path for empty folder, it is giving correct modified date:
print(datetime.fromtimestamp(getmtime("/dbfs/mnt/ADLS1/LANDING/parent/child/subfolder1")).date())
Can someone please help me out, what am I missing here in loop?