0

I want to create N partitions of user events by modding user_id by N so events can be processed in the order by the consumers in the order they were sent.

If I ever decide N is not enough to handle the load and want to increase the number of partitions and consumers respectively, what do I have to do to preserve the events order when consuming user events?

Glide
  • 20,235
  • 26
  • 86
  • 135
  • 1
    I think this seems to answer it http://stackoverflow.com/questions/33677871/is-it-possible-to-add-partitions-to-an-existing-topic-in-kafka-0-8-2 – yaswanth Sep 21 '16 at 13:12

1 Answers1

1

Well, you could create a new topic with an increased partition count and then copy all your events over into the new topic. That way you maintain ordering (with respect to a given user_id, there would be no promises about ordering across different user_id in your original scheme anyway).

Of course, that's likely to require downtime. The naive solution of just increasing the partition count obviously won't work since it will change your hashing calculation and result in the events for a given user_id being split across multiple partitions (and thus losing ordering). The difficulty of increasing partition count is one of the reasons you want to think hard about partition count when initially creating a topic.

Aurand
  • 5,487
  • 1
  • 25
  • 35
  • I'm new to Kafka..1) Is there a tool for copying events to a new topic? 2) Is it possible just to only move over messages that ***haven't*** been read and disregard messages that have been read? – Glide Sep 21 '16 at 06:49
  • 1) There's MirrorMaker, https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330 2) Not without writing your own copier that's aware of your consumer offsets, that's the only way of knowing what's "read". Though if you have multiple consumer groups the concept of "read" gets complex. – Aurand Sep 21 '16 at 14:21
  • As a follow-up, MirrorMaker2 is a replacement for that link. And it can do some logic related to consumer group migrations across clusters – OneCricketeer Jan 31 '22 at 22:23