I'm able to establish a connection to my Databricks FileStore DBFS
and access the filestore.
Reading, writing, and transforming data with Pyspark is possible but when I try to use a local Python API such as pathlib
or the OS
module I am unable to get past the first level of the DBFS file system
I can use a magic command:
%fs ls dbfs:\mnt\my_fs\...
which works perfectly and lists all the child directories?
but if I do os.listdir('\dbfs\mnt\my_fs\')
it returns ['mount.err']
as a return value
I've tested this on a new cluster and the result is the same
I'm using Python on a Databricks Runtine Version 6.1 with Apache Spark 2.4.4
is anyone able to advise.
Edit :
Connection Script :
I've used the Databricks CLI library to store my credentials which are formatted according to the databricks documentation:
def initialise_connection(secrets_func):
configs = secrets_func()
# Check if the mount exists
bMountExists = False
for item in dbutils.fs.ls("/mnt/"):
if str(item.name) == r"WFM/":
bMountExists = True
# drop if exists to refresh credentials
if bMountExists:
dbutils.fs.unmount("/mnt/WFM")
bMountExists = False
# Mount a drive
if not (bMountExists):
dbutils.fs.mount(
source="adl://test.azuredatalakestore.net/WFM",
mount_point="/mnt/WFM",
extra_configs=configs
)
print("Drive mounted")
else:
print("Drive already mounted")