1

I am new to Apache Kafka. I want to assign a our user id as id to the topic partition. Is there a way to assign our own user-id to partition. I did research for couple hours on this, but didn't find any article related to assigning an ID to partition.

While publishing a message to Topic I want to use the user-id as key. So that all messages goes into the same partition. And I want to make sure that one partition should contain only one user related messages.

Can I use this user-id in consumers while consuming messages from partition?

Is there a way to achieve this functionality?

Awesome
  • 5,689
  • 8
  • 33
  • 58

2 Answers2

0

Id generation logic is up to your own application/services.

The default behavior of Kafka DefaultPartitioner is to place all same IDs into the same partition, ordered by arrival time (not necessarily producer time without extra producer configuration); there is nothing you need to do for this

Can I use this user-id in consumers

It's just the record key, so yes...

If you mean you want to assign consumer certain partitions based on the ID, then you'd need to reverse the DefaultPartitioner hash function

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • I want to make sure that one partition should contain only one user related messages. With "DefaultPartitioner" multiple user messages are coming to same partition. – Awesome Oct 11 '21 at 15:17
  • One user-id cannot exist in multiple partitions. One partition **can** hold multiple users. This is what allows you to scale the process to multiple users beyond your original partition count. There's no way to fix this without using a topic, or cluster, per "user account" – OneCricketeer Oct 11 '21 at 15:19
  • If we allow multiple users to same partition, if user1 have 10k records and after that user2 produced 10 records, then user2 have to wait until user1 records consumed. So I want to have one partition for each user. – Awesome Oct 11 '21 at 15:22
  • Okay, well, that's the tradeoff for the architecture you've made. How many users do you plan on having? Kafka shouldn't be used to handle tens of thousands of partitions for a single topic. You also shouldn't constantly increase partition count for when you reach capacity – OneCricketeer Oct 11 '21 at 15:27
  • We have 1k users that means we have to create 1k partitions. Is there a way to maintain 1-to-1 relationship between partition and user ? – Awesome Oct 11 '21 at 15:44
  • You're welcome to write your own Partitioner, that maintains some state based on how many users get created. As soon as you reach some number of users (say 80% of all partitions) you can increase the partition count... However, I don't see this as sustainable – OneCricketeer Oct 11 '21 at 16:07
0

To implement the interface, you can customize the partition logic:

public interface Partitioner

detail: https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/producer/Partitioner.html#partition-java.lang.String-java.lang.Object-byte:A-java.lang.Object-byte:A-org.apache.kafka.common.Cluster-

xuanzjie
  • 51
  • 7