1

I want to use Athena partition projection to write queries that filter on date partition columns. The issue is that I need the physical data format that will be projected (the S3 file prefixes) to differ from the date formats that users will query by.

For example, my physical data format in S3 stores prefixes as URL encoded dates: s3://bucket/table_root/landing_time=yyyy-MM-dd%20HH%3Amm%3Ass.S'

But I want users to be able to query Athena using readable dates (No URL encoded formatting):

select * from table_root where landing_time='2020-01-01 12:00:00.0'

I tried using the following table settings for partition projection:

projection.landing_time.range: NOW-2YEARS,NOW
projection.landing_time.type: date
projection.landing_time.interval: 1
projection.landing_time.interval.unit: HOURS
projection.landing_time.format: yyyy-MM-dd'%20'HH'%3A00%3A00.0'
projection.enabled: true

I see partition projection allows custom path derivations using table property: storage.location.template: s3://bucket/table_root/a=${a}/${b}/some_static_subdirectory/${c}/

But I still do not see any way to customize the projected format of date columns and allow it to differ from the format used to run queries. Is there any way to do this?

Golammott
  • 486
  • 2
  • 15

2 Answers2

2

This can be solved by adding an additional partition column to handle hours and changed landing_time to only handle the date part. After that, changing storage.location.template to include the url encoded %20 and %3A values required by URL encoding gave me the desired result.

projection.landing_time.range: 2020-01-01,NOW
projection.landing_time.type: date
projection.landing_time.interval: 1
projection.landing_time.interval.unit: DAYS
projection.landing_time.format: yyyy-MM-dd
projection.hours.type: integer
projection.hours.range: 0,23
projection.hours.digits: 2
projection.enabled: true
storage.location.template: s3://bucket/table_root/landing_time=${landing_time}%20${hours}%3A00%3A00.0

Athena queries using partition projection can now be written in the format:

select * from table_root where landing_time='2020-01-01' and hours=1; 
select * from table_root where landing_time='2020-01-01' and hours>2 and hours<10;

And the correct format matching my S3 data prefixes will be projected to S3.

Golammott
  • 486
  • 2
  • 15
  • pls look into my question as well. it's urgent. will be thankful to you. https://stackoverflow.com/questions/71382264/invalid-table-property-for-input-string-0-23-property-projection-hour-digi – Danish Mar 07 '22 at 22:27
0

The java DateTimeFormatter allows for custom strings to be inserted in the format, for example, the text The year is %20 YYYY will be parsed to something like The year is %20 2021. You might need to specify landing_time=yyyy-MM-dd'%20'HH'%3A00%3A00.0' as the format instead of just yyyy-MM-dd'%20'HH'%3A00%3A00.0'

Nicolas Busca
  • 1,100
  • 7
  • 14