1

I need to get last modified dates of all Folders and Files in DBFS mount point (of ADLS Gen1) under Azure Databricks. Folder structure is like:

Not containing any files, Empty folders:
/dbfs/mnt/ADLS1/LANDING/parent/child/subfolder1
/dbfs/mnt/ADLS1/LANDING/parent/child/subfolder2/subfolder3

Containing some files:
/dbfs/mnt/ADLS1/LANDING/parent/XYZ/subfolder4/File1.txt
/dbfs/mnt/ADLS1/LANDING/parent/XYZ/subfolder5/subfolder6/File2.txt

Used following Python code to get last modified date:

root_dir = "/dbfs/mnt/ADLS1/LANDING/parent"

def get_directories(root_dir):

    for child in Path(root_dir).iterdir():

        if child.is_file():
            print(child, datetime.fromtimestamp(getmtime(child)).date())
      
        else:
            print(child, datetime.fromtimestamp(getmtime(child)).date())
            get_directories(child)

From above code, I am getting correct modified date for all folders containing files.

But for empty folders, it is giving current date. Not last modified date.

Whereas, when I hardcode the path for empty folder, it is giving correct modified date:

print(datetime.fromtimestamp(getmtime("/dbfs/mnt/ADLS1/LANDING/parent/child/subfolder1")).date())

Can someone please help me out, what am I missing here in loop?

Gopesh
  • 195
  • 1
  • 3
  • 17

1 Answers1

0

Seems, the issue was with processing time. I given a wait time : time.sleep(.000005). It worked as expected.

Gopesh
  • 195
  • 1
  • 3
  • 17