I am running a pyspark script where I'm saving off some data to a s3 bucket each time the script is run and I have this code:
data.repartition(1).write.mode("overwrite").format("parquet").partitionBy('time_key').save( "s3://path/to/directory")
It is partitioned by time_key but at each run, but the latest data dump is overwriting the previous data instead of being adding a partition. The time_key is unique to each run.
Is this the correct code if I want to write the data to s3 and partition by time key at each run?