1

While writing the data to S3 using dynamic frame i want to use partitioning columns which are not in dynamic frame.

For example:

def write_date(outpath,year):
    glue_context.write_dynamic_frame.from_options(
        frame = projectedEvents,
        connection_type = "s3",    
        connection_options = {"path": outpath, "partitionKeys": [year]},
        format = "parquet")

Here year is a parameter which does not present in dynamic frame.

This code is failing with an error: 'partition column "2021" not found in schema'

How can I write data in S3 using my own partitions?

Basically I want to write in S3 path as "outpath/2021/<parquet_file>"

Beginner
  • 71
  • 1
  • 3
  • 10

1 Answers1

1

This would work:

projectedEvents = projectedEvents.withColumn('year', lit(2021))

def write_date(frame,outpath,year):
    glue_context.write_dynamic_frame.from_options(
        frame = frame,
        connection_type = "s3",    
        connection_options = {"path": outpath, "partitionKeys":[year]},
        format = "parquet")

write_date(projectedEvents, outpath, 'year')

I would suggest that you take another look into partitioning. It has to be a column of the data_frame.

Robert Kossendey
  • 6,733
  • 2
  • 12
  • 42