4

Currently, AWS Firehose has a default partitioning feature to return the data into S3 with this following partitioned format of folders: YYYY/MM/DD/HH => e.g: 2017/10/26/18

But, I would like to make it like this:

Year=2017/Month=10/Day=26/Hour=18

Is there a way to make the default way to be like above in firehose?

I was trying to trigger a SNS topic to invoke a lambda to change the names to be year=yyyy, month=mm, etc, but the problem is that firehose takes some time to create those default partitioned folders. So I am not too sure how to achieve this without possible conflicts - lambda calls before folder has been created.

It would be best if there is an AWS way to handle this, which would be an ideal - which I have not found it yet.

Any suggestion would be appreciative. Thanks!

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Daniel
  • 41
  • 4

2 Answers2

1

Using Dynamic Partitioning, you can use the following expression in the S3 bucket prefix on the Kinesis Firehose configuration:

input/kinesis-realtime/year=!{timestamp:yyyy}/month=!{timestamp:MM}/day=!{timestamp:dd}/

18TillIDie
  • 31
  • 3
-1

use s3 prefix option as 'year=!{timestamp:YYYY}/month=!{timestamp:MM}/day=!{timestamp:dd}/' to get your folder structure as Year=2017/Month=10/Day=26/Hour=18