0

I'm trying to save a dataframe in Databricks using this code:

df.write.format("delta").save("abfss://my_container@my_storage.dfs.core.windows.net/my_path/filename.snappy.parquet")

But this creates additional subfolder filename.snappy.parquet, so the file is stored like my_path/filename.snappy.parquet/filename.snappy.parquet instead of desired my_path/filename.snappy.parquet.

How to save it without this additional unneeded subfolder?

archjkeee
  • 13
  • 4

2 Answers2

0

Spark writes the data to the folder and the path mentioned in .save is the folder name that spark creates.

  • inside the folder there will be the part files that spark creates.

If you want to create the file with specific name then we need to rename the files as a post process(using dbutils.fs.mv) after the spark job writes the data.

dbutils.fs.mv("my_path/<file_name>", "my_path/filename.snappy.parquet/filename.snappy.parquet")
notNull
  • 30,258
  • 4
  • 35
  • 50
0

Instead of

df.write.format("delta").save("abfss://my_container@my_storage.dfs.core.windows.net/my_path/filename.snappy.parquet")

specify

df.write.format("delta").save("abfss://my_container@my_storage.dfs.core.windows.net/my_path/")

It will automatically create part files under my_path