0

I am trying to run a report on my storage of Azure Gen2 Data lake. I have written a below recursive function that goes inside every folder and list files till last level.

def recursive_ls(path: str):
  
    """List all files from path recursively."""
    for file in dbutils.fs.ls(path):
        if file.path[-1] is not '/':
            yield (file.path.split('/')[3:11],file.size)
        else:
            for folder in recursive_ls(file.path):
                yield folder

I have very with huge number of files and as a result this function is not coming even after 2 hours.

This might be happening because it currently handled by one single process. I need some way where in I can execute these executor functions in a multiprocessing environment.

Harshit
  • 560
  • 1
  • 5
  • 15
  • Does this answer your question? https://stackoverflow.com/questions/11920490/how-do-i-run-os-walk-in-parallel-in-python – D Hudson Mar 08 '21 at 12:09
  • @DHudson I am afraid it does not. I am getting generators and no solution is using them. – Harshit Mar 08 '21 at 14:36

0 Answers0