0

I read on some guides online that if you are using key ordering the new partition will eventually break that ordering, I really can't see how. Is this really what happens ?

GionJh
  • 2,742
  • 2
  • 29
  • 68

1 Answers1

1

Yes, this is what is usually happening. To be more precise, there is no guarantee that the old ordering stays the same.

The partitioning of messages is basically happening through

hash(key) % number_of_partitions

Let us assume you have a topic with two partitions. Your data (key:value) looks like this

a:1
b:1
c:1
a:2
b:2
c:2

Now, those messages would go into two partitions:

partition0: a:1, b:1, a:2, b:2
partition1: c:1, c:2

If you now add one partition and you produce new messages a:3, b:3, c:3 into the topic you could end up like this:

partition0: a:1, b:1, a:2, b:2, a:3
partition1: c:1, c:2, c:3
partition2: b:3

Now, consuming the messages from this topic, you could end up processing b:3 before processing b:2 because the one consumer reading partition0 might take longer then another consumer of the same ConsumerGroup reading partition2.

Michael Heil
  • 16,250
  • 3
  • 42
  • 77
  • Ok, I thought Kafka used a different and and maybe more advanced approach than hashing to assign same keys messages to the same partition, thanks a lot. – GionJh May 05 '20 at 09:23
  • You could always write your own custom partitioner class to define your own logic for a Producer. – Michael Heil May 05 '20 at 09:29