I saved out a pyspark dataframe to s3 with the following command:
df.coalesce(1).write.partitionBy('DATE'
).format("com.databricks.spark.csv"
).mode('overwrite'
).option("header", "true"
).save(output_path)
Which give me:
file_path/FLORIDA/DATE=2019-04-29/part-00000-1691d1c6-2c49-4cbe-b454-d0165a0d7bde.c000.csv
file_path/FLORIDA/DATE=2019-04-30/part-00000-1691d1c6-2c49-4cbe-b454-d0165a0d7bde.c000.csv
file_path/FLORIDA/DATE=2019-05-01/part-00000-1691d1c6-2c49-4cbe-b454-d0165a0d7bde.c000.csv
file_path/FLORIDA/DATE=2019-05-02/part-00000-1691d1c6-2c49-4cbe-b454-d0165a0d7bde.c000.csv
Is there an easy way to reformat this path in s3 to follow this structure?:
file_path/FLORIDA/allocation_FLORIDA_20190429.csv
file_path/FLORIDA/allocation_FLORIDA_20190430.csv
file_path/FLORIDA/allocation_FLORIDA_20190501.csv
file_path/FLORIDA/allocation_FLORIDA_20190502.csv
I have a few thousand of these so if there is an programmatic way to do this, that would be much appreciated!