0

I have a S3 folder with parquet data that I want to access from Athena tables. The folder structure is:

s3://my-bucket/my-table/dt=2023-04-19/my-files-XXX.parquet

The problem is that the crawler understands dt as a string and I want it to be detected as a date.

How can I specify the type of the partitions to the Crawler?

tonicebrian
  • 4,715
  • 5
  • 41
  • 65
  • Seems that’s the default behaviour and cannot be changed. See https://stackoverflow.com/questions/54574987/aws-glue-crawler-partition-keys-types?rq=2 – Jimson James Apr 19 '23 at 10:46

1 Answers1

0

You can create custom classifiers on Glue. Please refer to https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html#custom-classifier-grok for further details. You can easily create in on the Glue console. For your task, I assume GROK pattern would be :

dt=%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:day}
Utku Can
  • 683
  • 3
  • 12