Databricks save dataframe without creating subfolder

Question

I'm trying to save a dataframe in Databricks using this code:

df.write.format("delta").save("abfss://my_container@my_storage.dfs.core.windows.net/my_path/filename.snappy.parquet")

But this creates additional subfolder filename.snappy.parquet, so the file is stored like my_path/filename.snappy.parquet/filename.snappy.parquet instead of desired my_path/filename.snappy.parquet.

How to save it without this additional unneeded subfolder?

score 0 · Answer 1 · answered Aug 07 '23 at 02:32

Spark writes the data to the folder and the path mentioned in .save is the folder name that spark creates.

inside the folder there will be the part files that spark creates.

If you want to create the file with specific name then we need to rename the files as a post process(using dbutils.fs.mv) after the spark job writes the data.

dbutils.fs.mv("my_path/<file_name>", "my_path/filename.snappy.parquet/filename.snappy.parquet")

score 0 · Answer 2 · answered Aug 07 '23 at 10:12

Instead of

df.write.format("delta").save("abfss://my_container@my_storage.dfs.core.windows.net/my_path/filename.snappy.parquet")

specify

df.write.format("delta").save("abfss://my_container@my_storage.dfs.core.windows.net/my_path/")

It will automatically create part files under my_path

Databricks save dataframe without creating subfolder

2 Answers2