I am working on some code on my local machine on pycharm. The execution is done on a databricks cluster, while the data is stored on azure datalake.
basaically, I need to list down the files in azure datalake directory and then apply some reading logic on the files, for this I am using the below code
sc = spark.sparkContext
hadoop = sc._jvm.org.apache.hadoop
fs = hadoop.fs.FileSystem
conf = hadoop.conf.Configuration()
path = hadoop.fs.Path('adl://<Account>.azuredatalakestore.net/<path>')
for f in fs.get(conf).listStatus(path):
print(f.getPath(), f.getLen())
the above code runs fine on the databricks notebooks, but when i try to run the same code through pycharm using databricks-connect i get the following error.
"Wrong FS expected: file:///....."
on some digging it turns out, that the code is looking in my local drive to find the "path". I had a similar issue with python libraries (os, pathlib)
I have no issue in running other code on the cluster.
Need help in figuring out how to run this so as to search the datalake and not my local machine.
Also, azure-datalake-store client is not an option due to certain restrictions.