1

I am evaluating different streaming/messaging services for use as an Event Bus. One of the dimensions I am considering is the ordering guarantee provided by each of the services. Two of the options that I am exploring are AWS Kinesis and Kafka and from a high level, according it looks like they both provide similar ordering guarantees where records are guaranteed to be consumable in the same order they were published only within that shard/partition.

It seems that AWS Kinesis APIs expose the ids of the parent shard(s) such that Consumer Groups using KCL can ensure records with the same partition key can be consumed in the order they were published (assuming a single threaded publisher) even if shards are being split and merged.

My question is, does Kafka provide any similar functionality such that records published with a specific key can be consumed in order even if partitions are added while messages are being published? From my reading, my understanding of partition selection (if you are specifying keys with your records) behaves along the lines of HASH(key) % PARTITION_COUNT. So, if additional partitions are added, they partition where all messages with a specific key will be published may (and I've proven it does locally) change. Simultaneously, the Group Coordinator/Leader will reassign partition ownership among Consumers in Consumer Groups receiving records from that topic. But, after reassignment, there will be records (potentially unconsumed records) with the same key found in two different partitions. So, from the Consumer Group level is there no way to ensure that the unconsumed records with the same key now found in different partitions will be consumed in the order they were published?

I have very little experience with both these services, so my understanding may be flawed. Any advice is appreciated!

Jake Riley
  • 21
  • 3
  • Your understanding is correct (although consumers can indeed read the record metadata to see what partitions keys/records come from, and might be able to detect a change)... What is the use-case for expanding the partition sizes? You cannot shrink them later, if this is to account for dynamic sizing – OneCricketeer Dec 22 '21 at 20:43
  • 1
    Thank you for confirming! I finally found in [the documentation](https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster/) where this is addressed. They explicitly call out that increasing the partition count may break message ordering guarantees desired by the consumer. I do not have a use case in mind for expanding partition sizes except for if load increases over time, but it seems as though it is recommended that you over partition to account for that. – Jake Riley Jan 31 '22 at 22:15
  • For any others that land here, [here is a related SO post](https://stackoverflow.com/questions/39608714/kafka-how-to-preserve-order-of-events-when-partitions-increase) that addresses options for preserving ordering of messages (from producer to consumer) when adding additional partitions. – Jake Riley Jan 31 '22 at 22:18

1 Answers1

1

My understanding was correct (as confirmed by @OneCricketeer and the documentation). Here is the relevant section of the documentation:

Although it’s possible to increase the number of partitions over time, one has to be careful if messages are produced with keys. When publishing a keyed message, Kafka deterministically maps the message to a partition based on the hash of the key. This provides a guarantee that messages with the same key are always routed to the same partition. This guarantee can be important for certain applications since messages within a partition are always delivered in order to the consumer. If the number of partitions changes, such a guarantee may no longer hold. To avoid this situation, a common practice is to over-partition a bit. Basically, you determine the number of partitions based on a future target throughput, say for one or two years later. Initially, you can just have a small Kafka cluster based on your current throughput. Over time, you can add more brokers to the cluster and proportionally move a subset of the existing partitions to the new brokers (which can be done online). This way, you can keep up with the throughput growth without breaking the semantics in the application when keys are used.

Jake Riley
  • 21
  • 3