AWS ETL Job Found duplicate column(s) in the data schema and the partition schema: `day`, `hour`, `month`, `year`

Question

S3_node1653573520077 = glueContext.create_dynamic_frame.from_catalog(
    database="database",
    push_down_predicate="(year == 2021)",
    table_name="table",
    transformation_ctx="S3_node1653573520077",
)

For the AWS Glue ETL job, my purpose is to convert the data of CataLog into RDS through SQL, but I seem to be stuck at the beginning. That is like read the data of CataLog into this "DataFrame", the data source of this table is stored in S3, partition by the year, month and day hours.

When I start run the job, it occurs the error

Found duplicate column(s) in the data schema and the partition schema: day, hour, month, year

I don't quite understand why this error occurs.

Has anyone encountered a similar situation?

score 0 · Accepted Answer · edited Jun 28 '22 at 15:26

0

I removed the partition_key and it worked for me. Check if the partition column year exists.

edited Jun 28 '22 at 15:26

Ethan

876
8
18
34

answered Jun 23 '22 at 18:28

Mukiibi Pius

18
2

Thanks your answer, that's correct, when I removed the partition key in the output data file, it works fine. – Wwww24115 Jun 27 '22 at 07:36

AWS ETL Job Found duplicate column(s) in the data schema and the partition schema: `day`, `hour`, `month`, `year`

1 Answers1

Linked