0

I have a path mounted in dbfs and I need to extract Excel files path from a given folder and the same folder contains Excel files or sub folder containing Excel files. Current code only gives Excel files in one folder and not in sub folders.

files = dbutils.fs.ls('/raw/internal/srange/2018_11_30_00_22_11/')
for file in files:
  if file.path.endswith('xlsx'):
    path = '/dbfs' + file.path[5:]
    print(path)
halfer
  • 19,824
  • 17
  • 99
  • 186
user3222101
  • 1,270
  • 2
  • 24
  • 43

2 Answers2

1

You should check for directories as well

def walk_dir(dir_path):
    dir_files = dbutils.fs.ls(dir_path)
    excel_files = []
    for file in dir_files:
        if file.isDir():
            excel_files.extend(walk_dir(file.path))
        elif file.path.endswith('.xlsx'):
            excel_files.append(os.path.join('/dbfs', file.path[5:]))
    return excel_files
    
all_excel = walk_dir('/raw/internal/srange/2018_11_30_00_22_11/')

I haven't tried the code so it might be buggy.

Lioness100
  • 8,260
  • 6
  • 18
  • 49
absolutelydevastated
  • 1,657
  • 1
  • 11
  • 28
0

This is what I recommend:

for file1 in dbutils.fs.ls("dbfs:/raw/internal/srange/2018_11_30_00_22_11/"):
  if '.xlsx' in file1.name:
    print (file1.name)
  for file2 in dbutils.fs.ls("dbfs:/raw/internal/srange/2018_11_30_00_22_11/"+file1.name):
    if '.xlsx' in file2.name:
      print (file2.name)
    for file3 in dbutils.fs.ls("dbfs:/raw/internal/srange/2018_11_30_00_22_11/"+file1.name+file2.name):
      if '.xlsx' in file3.name:
        print (file3.name)
VoldyArrow
  • 81
  • 3