Is it possible to write each aws glue dynamicrecord to different s3 path

Question

I am new AWS glue. I need to write each record in a dynamic frame to a custom folder path in s3. For example

Following is the target s3 path:

<bucket>/parentfolder/<year>/<month>/<day>/<somegroupid>/<random_file_name>.json

Here, 'year', 'month', 'day', 'somegroupid' are available as columns in each record.

Is it possible to use column values in the record to decide on the path where the JSON file will be written?

In Pyspark, you can use partitionBy when writing your DataFrame to S3: `df.write.partitionBy('year', 'month', 'day', 'somegroupid').json("/parentfolder/")` — blackbishop, Jan 21 '21 at 13:46
glueContext.write_dynamic_frame.from_options(frame = dynamicframe2, connection_type = "s3", connection_options = {"path": "s3://path/","partitionKeys": ["year", "month", "day", "somegroupid"]}, format = "json", transformation_ctx = "datasink3") i could find the above equivalent for glue, it worked. Thanks for your guidance @blackbishop — Karthik, Jan 21 '21 at 14:27

score 0 · Answer 1 · answered Feb 16 '21 at 08:26

Please see Managing Partitions for ETL Output in AWS Glue - Writing Partitions

glue_context.write_dynamic_frame.from_options(
    frame = projectedEvents,
    connection_type = "s3",    
    connection_options = {"path": "$outpath", "partitionKeys": ["year", "month", "day", "somegroupid"]},
    format = "parquet")

This would give you: s3://my_bucket/logs/year=2018/month=01/day=23/

Unfortunately there doesn't seem to be a way to get rid of the field=value because it can be valuable in some cases:

Crawlers not only infer file types and schemas, they also automatically identify the partition structure of your dataset when they populate the AWS Glue Data Catalog. The resulting partition columns are available for querying in AWS Glue ETL jobs or query engines like Amazon Athena.

Systems like Amazon Athena, Amazon Redshift Spectrum, and now AWS Glue can use these partitions to filter data by partition value without having to read all the underlying data from Amazon S3.

Is it possible to write each aws glue dynamicrecord to different s3 path

1 Answers1