2

I am using databricks-connect to connect my local PyCharm IDE to an Azure Databricks cluster.

This is working completely fine until I try to access files without a Spark Context.

For example

dbutils.fs.put('dbfs:/tmp/test_file.txt','line_1')
with open('/dbfs/tmp/test_file.txt') as f:
  print(f.read())

is working fine when being run directly in the databricks notebook. When I try to run this code snippet via databricks-connect in PyCharm I get a FileNotFoundError. This also happens for other file system operations (shutil, gzip, ...).

I assume that the program ("open") is trying to access the file system on my local client running PyCharm, but I want "open" and similar functions to access the dbfs.

Is there a way to achieve this?

Rodan
  • 23
  • 3

1 Answers1

1

the open function belongs to the Python file API, so it works only with local files, and in case of the databricks-connect - this is a file system of your computer. The /dbfs/ mount point is available only on the cluster nodes.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thank you for your reply. So no way to run this via databricks-connect? What would in this case be the recommended workflow for local development? I read about dbx ( https://learn.microsoft.com/de-de/azure/databricks/dev-tools/dbx ), but this seems a little overkill as the deployment/CI part is already covered in our setup. – Rodan Sep 09 '22 at 11:53
  • You can use Hadoop API instead: https://docs.databricks.com/dev-tools/databricks-connect.html#access-the-hadoop-filesystem. Or you can use dbutils.fs.cp to copy file to local machine: https://docs.databricks.com/dev-tools/databricks-connect.html#access-dbutils – Alex Ott Sep 09 '22 at 13:34