1

When I run crawler from Glue on parquet/csv file in S3 bucket, It takes Date as a string. I changed in edit schema and set it to Date. Not only date but it's not changing any data type in edit schema. When I fire query from Athena "select order_date from parque_sales", the error is:

HIVE_BAD_DATA: Field order_date's type BINARY in parquet is incompatible with type date defined in table schema

I also tried changing schema in Glue studio, the result is same!

Thanks in advance! Please help!

Nishu Tayal
  • 20,106
  • 8
  • 49
  • 101
SK Sayyad
  • 87
  • 9
  • Are you using a GLUE client API to invoke this service? – smac2020 Feb 10 '21 at 19:42
  • I am using Glue crawler to save the meta data of the file on the S3 bucket in the Glue catalog and using Athena to run queries on the glue tables. But edit schema in Glue databse as well as glue studio which is new service on amazon not working. Seems like custom transformation will work but I am confused as the glue generates his own script. – SK Sayyad Feb 10 '21 at 22:09

1 Answers1

1

It seems that you have files where date is stored as string in parquet, not as date. The crawler still receives those as string values and fails while comparing the source schema with the defined schema.

To avoid this issue, you should use explicit casting in the athena query. Something like this :

select date_parse(datestring,'%Y-%m-%d %h:%i:%s')
Nishu Tayal
  • 20,106
  • 8
  • 49
  • 101
  • I tried - select date_parse(last_update,'%Y-%m-%d %h:%i:%s') from city; it gives error like INVALID_FUNCTION_ARGUMENT: Invalid format: ""2006-02-15 04:45:25"" – SK Sayyad Feb 18 '21 at 03:13