I am able to write to parquet format and partitioned by a column like so:
jobname = args['JOB_NAME']
#header is a spark DataFrame
header.repartition(1).write.parquet('s3://bucket/aws-glue/{}/header/'.format(jobname), 'append', partitionBy='date')
But I am not able to do this with Glue's DynamicFrame.
header_tmp = DynamicFrame.fromDF(header, glueContext, "header")
glueContext.write_dynamic_frame.from_options(frame = header_tmp, connection_type = "s3", connection_options = {"path": 's3://bucket/output/header/'}, format = "parquet")
I have tried passing the partitionBy
as a part of connection_options
dict, since AWS docs say for parquet Glue does not support any format options, but that didn't work.
Is this possible, and how? As for reasons for doing it this way, I thought it was needed for job bookmarking to work, as that is not working for me currently.