Please see Managing Partitions for ETL Output in AWS Glue - Writing Partitions
glue_context.write_dynamic_frame.from_options(
frame = projectedEvents,
connection_type = "s3",
connection_options = {"path": "$outpath", "partitionKeys": ["year", "month", "day", "somegroupid"]},
format = "parquet")
This would give you: s3://my_bucket/logs/year=2018/month=01/day=23/
Unfortunately there doesn't seem to be a way to get rid of the field=value
because it can be valuable in some cases:
Crawlers not only infer file types and schemas, they also automatically identify the partition structure of your dataset when they populate the AWS Glue Data Catalog. The resulting partition columns are available for querying in AWS Glue ETL jobs or query engines like Amazon Athena.
Systems like Amazon Athena, Amazon Redshift Spectrum, and now AWS Glue can use these partitions to filter data by partition value without having to read all the underlying data from Amazon S3.