1
S3_node1653573520077 = glueContext.create_dynamic_frame.from_catalog(
    database="database",
    push_down_predicate="(year == 2021)",
    table_name="table",
    transformation_ctx="S3_node1653573520077",
)

For the AWS Glue ETL job, my purpose is to convert the data of CataLog into RDS through SQL, but I seem to be stuck at the beginning. That is like read the data of CataLog into this "DataFrame", the data source of this table is stored in S3, partition by the year, month and day hours.

When I start run the job, it occurs the error

Found duplicate column(s) in the data schema and the partition schema: day, hour, month, year

I don't quite understand why this error occurs.

Has anyone encountered a similar situation?

Wwww24115
  • 35
  • 7

1 Answers1

0

I removed the partition_key and it worked for me. Check if the partition column year exists.

Ethan
  • 876
  • 8
  • 18
  • 34
  • Thanks your answer, that's correct, when I removed the partition key in the output data file, it works fine. – Wwww24115 Jun 27 '22 at 07:36