1

We want to put messages/records of a different customers on different partitions of a kafka topic.

But number of customers is not known in prior. So how can we set partition count for kafka topic in this case? Do we need any other way where partition count changes at runtime based on keys (customer_id in this case). Thanks in advance.

1 Answers1

1

need to know number of partitions

Assuming Java, use AdminClient.describeTopics() method call and get partitions of each response object.


Regarding the rest of the question, consumer instances automatically distribute partition assignment when subscribing to topics.

Producers should not know about consumers, so you don't "put records on partitions" based on any factor of (possible) consumers.


partition count changes at runtime based on keys (customer_id)

Unclear what this means. Partition count can only increase, and if you do increase it, then your partitions will become unordered, so you should consider how large your keyspace is before creating the topic. For example, if you have a numeric ID, and use the first two digits as the partition value, then you could create a topic up to 100 partitions.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Hi @OneCricketeer, thanks for your answer. But I have read that Kafka uses the key to specify the target partition. The default strategy is to choose a partition based on a hash of the key. Suppose my use case is to story events of different customers in different partitions. So I have used customer_id as a key for kafka message. But I guess we have to specify the partition count during the creation of topic itself. But if I want to use customer partition strategy of writing records of different customers into different partitions, I cannot specify the number of partions of topic? – Sai Satwik Kuppili Oct 18 '22 at 06:18
  • _guess we have to specify the partition count during the creation of topic itself_ - That's correct; `kafka-topics.sh --create --topic NAME --partitions X --replication-factor Y`. You do not have to use default partitioning strategy of hashing, that is overridable, as I answered. – OneCricketeer Oct 18 '22 at 06:20
  • Different customers don't necessarily need to be in different partitions, either, but that will depend on your consumer operations. In other words, you will have no way of knowing how many customer's you'll have, and it is not scalable to constantly be adding more and more partitions. Therefore, you'll eventually run into a hash-collision using the default partitioner. Or you can say "all ids that start with 0, are in partition 0; ids with 1 go to partition 1, etc" – OneCricketeer Oct 18 '22 at 06:24