2

I know that Kafka will not be able to guarantee ordering of data when a topic has multiple partitions. But my problem is:- I need to have multiple partitions to an event topic(user activities generating events) since I want multiple consumer groups to consume the data from the topic. But there are times when I need to bootstrap the entire data,i.e, read the complete data right from the beginning to the end and rebuild my graph of events from the historical messages in Kafka and then I lose the ordering which is creating problem. One approach might be to process it in a Map-Reduce paradigm where I map the data based on time and order it and consume it. Is there anybody who has faced similar situation / problem and who would like to help me out with the right approach / solution.

Thanks in advance.

Vikram
  • 368
  • 3
  • 12
  • 2
    `I need to have multiple partitions to an event topic(user activities generating events) since I want multiple consumer groups to consume the data from the topic` Just a small remark, there is no need to have multiple partitions to support consumer groups, partitions are only necessary if you need more than one consumer per consumer group. You can have as many consumer groups per partition as you'd like. – Sönke Liebau Mar 09 '17 at 09:15
  • Oh Yes. What was I even thinking. I definitely need to go through Kafka documentation again. Thanks a lot for your help !! – Vikram Mar 09 '17 at 09:27
  • Also one more comment: Kafka ordering guarantee is per offset and not per timestamp. So you can read data only "ordered by timestamp" if the timestamp are ascending, but there is no guarantee for this. By default, the Producer set the timestamp for a message, and thus, even if you have a single partition but multiple producer, data might be written not in timestamp order. If you need to guarantee that timestamps are ordered per partition, too, you can still change broker/topic setting from `CREATE_TIME` to `LOG_APPEND_TIME` but this will of course change the semantics of your timestamps, too. – Matthias J. Sax Mar 09 '17 at 18:57
  • https://stackoverflow.com/questions/39574328/kafka-multiple-partition-ordering – Dmitry Minkovsky Jan 25 '18 at 16:27

1 Answers1

0

As per kafka documentation global ordering throughout partitions not guaranteed so you can create N number of partitions with N number of consumers. Create partitions based on type of data i.e. all type of data of category A should go in one partition as the order of messages maintained within partition you can consume those messages in separate consumer and process data.

I gone through some blogs which saying buffer those messages and apply sorting logic on those messages, but this is not seems to be a good practice as one of partition may be slow message message is late in some cases and you need to sort your messages as and when every new message arrives.

Amol Suryawanshi
  • 2,108
  • 21
  • 29