The goal is to read a file as a byte string within Databricks from an ADLS mount point.
Confirming the ADLS mount point
Firstly, using dbutils.fs.mounts()
it is confirmed to have the following:
... MountInfo(mountPoint='/mnt/ftd', source='abfss://ftd@omitted.dfs.core.windows.net/', encryptionType=''), ...
Confirming the existence of the file
The file under question is titled TruthTable.csv
, its whereabouts have been confirmed using the following command:
dbutils.fs.ls('/mnt/ftd/TruthTable.csv')
which returns:
[FileInfo(path='dbfs:/mnt/ftd/TruthTable.csv', name='TruthTable.csv', size=156)]
Confirming the readability of the file
To confirm that the file can be read we can run the following snippet.
filePath = '/mnt/ftd/TruthTable.csv'
spark.read.format('csv').option('header','true').load(filePath)
which successfully returns
DataFrame[p: string, q: string, r: string, s: string]
The problem
As the goal is to be able to read a file as a byte string, the following snippet should be successful, however, it is not.
filePath = '/mnt/ftd/TruthTable.csv'
with open(filePath, 'rb') as fin:
contents = fin.read()
print(contents)
Executing the following snippet outputs:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/ftd/TruthTable.csv'
The documentation provided by the Databricks team on the following link [https://docs.databricks.com/data/databricks-file-system.html#local-file-apis][https://docs.databricks.com/data/databricks-file-system.html#local-file-apis] works only for files found in the /tmp/
folder, however, the requirement is the read a file directly from the mount point.