Rename Pyspark output files in s3

Question

I saved out a pyspark dataframe to s3 with the following command:

df.coalesce(1).write.partitionBy('DATE'
                                    ).format("com.databricks.spark.csv"
                                    ).mode('overwrite'
                                    ).option("header", "true"
                                    ).save(output_path)

Which give me:

file_path/FLORIDA/DATE=2019-04-29/part-00000-1691d1c6-2c49-4cbe-b454-d0165a0d7bde.c000.csv
file_path/FLORIDA/DATE=2019-04-30/part-00000-1691d1c6-2c49-4cbe-b454-d0165a0d7bde.c000.csv
file_path/FLORIDA/DATE=2019-05-01/part-00000-1691d1c6-2c49-4cbe-b454-d0165a0d7bde.c000.csv
file_path/FLORIDA/DATE=2019-05-02/part-00000-1691d1c6-2c49-4cbe-b454-d0165a0d7bde.c000.csv

Is there an easy way to reformat this path in s3 to follow this structure?:

file_path/FLORIDA/allocation_FLORIDA_20190429.csv
file_path/FLORIDA/allocation_FLORIDA_20190430.csv
file_path/FLORIDA/allocation_FLORIDA_20190501.csv
file_path/FLORIDA/allocation_FLORIDA_20190502.csv

I have a few thousand of these so if there is an programmatic way to do this, that would be much appreciated!

This post can help you: https://stackoverflow.com/questions/32501995/boto3-s3-renaming-an-object-using-copy-object — srikanth holur, Jun 14 '20 at 01:16

score 1 · Accepted Answer · answered Jun 14 '20 at 17:21

Figured out a decent way to go about this:

import datetime
import boto3
s3 = boto3.resource('s3')

for i in range(5):
    date = datetime.datetime(2019,4,29)
    date += datetime.timedelta(days=i)
    date = date.strftime("%Y-%m-%d")
    print(date)
    old_date = 'file_path/FLORIDA/DATE={}/part-00000-1691d1c6-2c49-4cbe-b454-d0165a0d7bde.c000.csv'.format(date)
    print(old_date)
    date = date.replace('-','')
    new_date = 'file_path/FLORIDA/allocation_FLORIDA_{}.csv'.format(date)
    print(new_date)

    s3.Object('my_bucket', new_date).copy_from(CopySource='my_bucket/' + old_date)

    s3.Object('my_bucket', old_date).delete()

Rename Pyspark output files in s3

1 Answers1

Linked