7

I am currently listing files in Azure Datalake Store gen1 successfully with the following command:

dbutils.fs.ls('mnt/dbfolder1/projects/clients')

The structure of this folder is

- client_comp_automotive_1.json [File]
- client_comp_automotive_2.json [File]
- client_comp_automotive_3.json [File]
- client_comp_automotive_4.json [File]
- PROCESSED [Folder]

I want to loop through those (.json) files in this folder and process them one by one, so that I can act on error or something else and move successfully processed file to a subfolder.

How can I do this in python. I have tried

folder = dbutils.fs.ls('mnt/dbfolder1/projects/clients')
files = [f for f in os.listdir(folder) if os.path.isfile(f)]

But this does not work. os is unknown. How can I do this within Databricks?

Sheldore
  • 37,862
  • 7
  • 57
  • 71
STORM
  • 4,005
  • 11
  • 49
  • 98

3 Answers3

17

The answer was simple even when i searched for two days:

files = dbutils.fs.ls('mnt/dbfolder1/projects/clients')

for fi in files: 
  print(fi.path)
STORM
  • 4,005
  • 11
  • 49
  • 98
3

Scala version of the same (with ADLS path)

val dirList = dbutils.fs.ls("abfss://<container>@<storage_account>.dfs.core.windows.net/<DIR_PATH>/")

// option1
dirList.foreach(println)

// option2
for (dir <- dirList) println(dir.name)
Gaurav Adurkar
  • 834
  • 8
  • 9
2

Another way that translates seamlessly to a local installation of python is:

import os
os.listdir("/dbfs/mnt/projects/clients/")
CandleWax
  • 2,159
  • 2
  • 28
  • 46