AnalysisException: Path does not exist: dbfs:/databricks/python/lib/python3.7/site-packages/sampleFolder/data;

Question

I am packing the following code in a whl file:

from pkg_resources import resource_filename
def path_to_model(anomaly_dir_name: str, data_path: str):
    filepath = resource_filename(anomaly_dir_name, data_path)
    return filepath
def read_data(spark) -> DataFrame:
    return (spark.read.parquet(str(path_to_model("sampleFolder", "data"))))

I confirmed that the whl file contains the parquet files under sampleFolder/data/ directory correctly. When i run this locally it works, but when i upload this whl file to dbfs and run then i get this error:

AnalysisException: Path does not exist: dbfs:/databricks/python/lib/python3.7/site-packages/sampleFolder/data;

I confirmed that this directory actually does not exist: dbfs:/databricks/python Any idea what this error could be?

Thanks.

score 4 · Accepted Answer · answered Jul 01 '21 at 06:13

By default Spark on Databricks works with files on DBFS, until you're explicitly change the schema. In your case, the path_to_model function returns the string /databricks/python/lib/python3.7/site-packages/sampleFolder/data, and because it doesn't have explicit schema, then Spark uses dbfs schema. But the file is on the local node, not on DBFS - that's why Spark can't find it.

To fix that, you need to copy data onto DBFS, and read from there. This could be done with dbutils.fs.cp command. Change code to following:

def read_data(spark) -> DataFrame:
    data_path = str(path_to_model("sampleFolder", "data"))
    tmp_path = "/tmp/my_sample_data"
    dbutils.fs.cp("file:" + data_path, tmp_path, True)
    return (spark.read.parquet(tmp_path))

score 4 · Answer 2 · edited Sep 01 '22 at 15:54

4

By default Spark on Databricks works with files on DBFS.

But if you want to read a file using spark.read.parquet function in databricks you can use the prefix file: followed by the complete path to the file E.g -

spark.read.parquet('file:/home/user1/file_name')
                    ^^^^

edited Sep 01 '22 at 15:54

Kashyap

15,354
13
64
103

answered Jan 25 '22 at 13:32

kartik

79
3

AnalysisException: Path does not exist: dbfs:/databricks/python/lib/python3.7/site-packages/sampleFolder/data;

2 Answers2

Linked