0

I am only starting to learn about Kafka Topic/Partitions, So I have a case where I have 1 topic and a possible 10,000 partitions possibly more. I'm assuming having 10,000 partitions is a very large number and this is discouraged.

So what I am thinking is to split the 1 topic into logical topic buckets and thus having the 10,000 partitions spread among these topics.

So instead of : 1 topic + 10,000+ partitions

I will have:

10 topics + 1,000 partitions each

Is this a viable approach?

M_K
  • 3,247
  • 6
  • 30
  • 47
  • Can you explain why you think you need 10000 partitions? – Mickael Maison Sep 29 '18 at 14:00
  • Yes so I will be using partitions for user interactions, each partition would have a key of user ID, so if we have 10,000± users we need 10,000 partitions to keep the interaction order. Does that make sense? – M_K Sep 29 '18 at 14:22
  • 1
    If each user has a unique key then you already have the order even if there are multiple user keys in a partition. No need to make a partition per key. Partitioning in Kafka is for parallelism generally. Unique keys will be hashed to the same partition every time. – dawsaw Sep 29 '18 at 14:43
  • @dawsaw so if I have multiple diff user interactions in a partition, how would I consume all interactions from a particular User with out consuming every interaction in a topic? – M_K Sep 29 '18 at 15:01
  • kafka is not a key-value store. you need to consume messages if they comes to specific topic, and handle incoming message in your application. But if you need to get data from kafka topic for specific user, you definitely need key-value store. as already mentioned, more partitions need for parallelism – Vasyl Sarzhynskyi Sep 29 '18 at 17:29
  • If you had one partition, even user event would be in order. If you had two partitions, all even and odd IDs could be in order. With ten, you can do a `id mod 10`, same for 100, and larger... As stated, it doesn't need to be exactly your expected user base – OneCricketeer Sep 29 '18 at 21:40
  • OK thanks I understand now that partitions are indeed for scaling out consumers but I should still should send to Kafka partitions (not 10,000) with this key and the partitions will still have the messages in order but that I also need a key value store to store all of the users interactions and query that store when I need all user interactions. Would something like confluent KSQL be a good approach for this? If I have a stream where when user interactions are coming in they get written to a key value store, is that possible? – M_K Sep 30 '18 at 09:11

0 Answers0