20

I am investigating on Kafka to assess its suitability for our use case. Can you please help me understand how flexible is Kafka with changing the number of partitions for an existing Topic?

Specifically,

  1. Is it possible to change the number of partitions without tearing down the cluster?
  2. And is it possible to do that without bringing down the topic?
  3. Will adding/removing partitions automatically take care of redistributing messages across the new partitions?

Ideally, I would want the change to be transparent to the producers and consumers. Does Kafka ensure this?

Update: From my understanding so far, it looks like Kafka's design cannot allow this because it mapping of consumer groups to partitions will have to be altered. Is that correct?

Aadith Ramia
  • 10,005
  • 19
  • 67
  • 86

3 Answers3

21

1.Is it possible to change the number of partitions without tearing down the cluster?

Yes kafka supports increasing the number of partitions at runtime but doesn't support decreasing number of partitions due to its design

2.And is it possible to do that without bringing down the topic?

Yes provided you are increasing partitions.

3.Will adding/removing partitions automatically take care of redistributing messages across the new partitions?

As mentioned earlier removing partitions is not supported .

When you increase the number of partitions, the existing messages will remain in the same partitions as before only the new messages will be considered for new partitions (also depending on you partitioner logic). Increasing the partitions for a topic will trigger a cluster rebalance , where in the consumers and producers will get notified with the updated metadata of the topics. Producers will start sending messages to new partitions after receiving updated metadata and consumer rebalancer will redistribute the partitions among the consumers groups and resume consumption from the last committed offset.All this will happen under the hood , so you wont have to do any changes at client side

Liju John
  • 1,749
  • 16
  • 19
  • 5
    But can we maintain ordering of msgs while adding partitions in running cluster? May be earlier there are 2 partitions and, 4 types of msgs as per partition key and we got ideal key hashing also so 2 type of msgs (partition keys) are given to each of partition. Adding 2 new partitions will start adding new msgs to old & new partitions (as per new key hashing algo), but there is a msg of type#4 in partition#2 at offset 2000 and current offset of consumer for partition#2 is just 10 and new type#4 msg is added in partition#4 which will be consumed first. But i want to maintain the order? – amandeep1991 Jul 13 '21 at 03:59
  • 1
    @amandeep1991 one option to preserve partitioning order without downtime: 1) avoid repartitioning and instead create second topic with increased partitions 2) change the consumer to poll messages from the second topic only once first is drained – Andrew Taran Mar 24 '23 at 13:54
15
  1. Yes, it it perfectly possible. You just execute the following command against the topic of your choice: bin/kafka-topics.sh --zookeeper zk_host:port --alter --topic <your_topic_name> --partitions <new_partition_count>. Remember, Kafka only allows increasing the number of partitions, because decreasing it would cause in data loss.

    • There's a catch here. Kafka doc says the following:

Be aware that one use case for partitions is to semantically partition data, and adding partitions doesn't change the partitioning of existing data so this may disturb consumers if they rely on that partition. That is if data is partitioned by hash(key) % number_of_partitions then this partitioning will potentially be shuffled by adding partitions but Kafka will not attempt to automatically redistribute data in any way.

  1. Yes, if by bringing down the topic you mean deleting the topic.
  2. Once you've increased the partition count, Kafka would trigger a rebalance, for consumers who are subscribing to that topic, and on subsequent polls, the partitions would get distributed across the consumers. It's transparent to the client code, you don't have to worry about it.

NOTE: As I mentioned before, you can only add partitions, removing is not possible.

Bitswazsky
  • 4,242
  • 3
  • 29
  • 58
3

+one more thing, if you are using stateful operations in clients like aggregations(making use of statestore), change in partition will kill all the streams thread in consumer. This is expected as increase in partition may corrupt stateful applications. So beware changing partition size, it may break stateful consumers connected to the topic.

Good read: Why does kafka streams threads die when the source topic partitions changes ? Can anyone point to reading material around this?

Valath
  • 880
  • 3
  • 13
  • 35