That is how data from different partitions is persisted in spark. You can use databricks fs
utility to rename the file.
I have written a small utility function to gather all data on one partition, persist as parquet and rename the only data file in the folder. You can adopt it for JSON or CSV. The utility accepts the folder path and file name, creates a "tmp" folder for persistence, and then moves and renames the file to desired folder:
def export_spark_df_to_parquet(df, dir_dbfs_path, parquet_file_name):
tmp_parquet_dir_name = "tmp"
tmp_parquet_dir_dbfs_path = dir_dbfs_path + "/" + tmp_parquet_dir_name
parquet_file_dbfs_path = dir_dbfs_path + "/" + parquet_file_name
# Export dataframe to Parquet
df.repartition(1).write.mode("overwrite").parquet(tmp_parquet_dir_dbfs_path)
listFiles = dbutils.fs.ls(tmp_parquet_dir_dbfs_path)
for _file in listFiles:
if len(_file.name) > len(".parquet") and _file.name[-len(".parquet"):] == ".parquet":
dbutils.fs.cp(_file.path, parquet_file_dbfs_path)
break
Usage:
export_spark_df_to_parquet(df, "dbfs:/my_folder", "my_df.parquet")