10

We primarily do bulk transfer of incoming click stream data through Kinesis Firehose service. Our system is a multi tenant SaaS platform. The incoming click stream data are stored S3 through Firehose. By default, all the files are stored under directories named per given date-format. I would like to specify the directory path for the data files in Firehose planel \ through API in order to segregate the customer data.

For example, the directory structure that I would like to have in S3 for customers A, B and C :

/A/2017/10/12/

/B/2017/10/12/

/C/2017/10/12/

How can I do it?

royhowie
  • 11,075
  • 14
  • 50
  • 67
Sriram V
  • 101
  • 1
  • 4

3 Answers3

2

AWS Firehose supports the dynamic partitioning .

It can be done in two ways either with inline JQ parser or lambda function.

Example:

"ExtendedS3DestinationConfiguration": {  
"BucketARN": "arn:aws:s3:::my-logs-prod",  
"Prefix": "customer_id=!{partitionKeyFromQuery:customer_id}/ 
    device=!{partitionKeyFromQuery:device}/ 
    year=!{partitionKeyFromQuery:year}/  
    month=!{partitionKeyFromQuery:month}/  
    day=!{partitionKeyFromQuery:day}/  
    hour=!{partitionKeyFromQuery:hour}/"  
} 
Nishu Tayal
  • 20,106
  • 8
  • 49
  • 101
0

You can separate your directories by configuring the S3 Prefix. In the console, this is done during setup when you set the S3 bucket name.

Using the CPI, you set the prefix in the --s3-destination-configuration as shown here:

http://docs.aws.amazon.com/cli/latest/reference/firehose/create-delivery-stream.html

Note however, you can only set one prefix per Firehose Delivery Stream, so if you're passing all of your clickstream data through one Firehose Delivery Stream you will not be able to send the records to different prefixes.

devonlazarus
  • 1,277
  • 10
  • 24
  • 5
    I would like to use the "prefix" as a variable and configure it based on the incoming click data [e.g click.customer_id]. However, thanks for your answer. – Sriram V Oct 20 '17 at 04:52
  • 2
    Understood, but if you're using a single Firehose Delivery Stream, you will not be able to write to different prefixes, even with a variable. As it stands now, there's no way to pass your variable through to the S3 prefix configuration on the Delivery Stream. If you want separate prefixes in one bucket, you'll have to use multiple Delivery Streams reading from the same Kinesis stream, and a record transformation Lambda to filter for a given prefix configured with the Delivery Stream. – devonlazarus Oct 20 '17 at 16:43
-5

Custom prefixes are now supported.

Das_Geek
  • 2,775
  • 7
  • 20
  • 26
nando
  • 55
  • 3
  • 5
    This doesn't seem to be a solution in that it doesn't provide a way to create folder(s) based on the event values (e.g. customer ID) – higee Feb 15 '20 at 19:19
  • It may not be an answer to the question, but it's good to be aware of. "Custom prefixes" is a little misleading on AWS's part though, as in reality it's just a limited set of timestamp-based or random string-based values you can choose from. – trademark Feb 09 '21 at 15:32