1

This question is related to Kafka to S3.

Requirement: One of the kafka topics we are interested has some particular information, i.e, timestamp, table and etc. We can use this data to decide which S3 path it will go to, i.e. s3bucketName/timestamp/table/...

Problem: We are thinking to use kafka connect, since there is no reason to reinvent the wheel. However, I couldn't find there is a way I can plugin a sort of function to do the mapping (from topic data to S3 path) in kafka connect (followed by the link https://docs.confluent.io/current/connect/kafka-connect-s3/configuration_options.html). I wonder does kafka connect provide this feature, if not, have someone else done this before ?

phuclv
  • 37,963
  • 15
  • 156
  • 475
Xiaohe Dong
  • 4,953
  • 6
  • 24
  • 53

1 Answers1

1

The TimestampPartitioner default behavior will write to

s3bucketName/s3Prefix/topicName/timestamp/files.avro

If that isn't satisfactory, Kafka Connect is entirely "plugin" driven, and you can write your own. For example, there is no partitioner that'll put both timestamp and a certain field within the data - you'd need to write that separately

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245