0

I have a frequently changing and large amount of objects, and I need to maintain kind of state table for each object. I'm considering to use a KTable for each object, but I'm kind of worried about the overhead that this structure would bring with it.

In that sense, what is the expected "overhead" of a KTable, when each object would get it's own topic and table, where objects would otherwise not get each its own topic? For example, how much memory does a topic with KTable consume?

To make an example (these are not actual numbers, just the relativen numbers is similar to what I'm looking for):

  • 1M objects, each object has a topic with one partition
  • 20 producers, 20 consumers
  • message size 1kb
  • update rate 100k messages per second
benjist
  • 2,740
  • 3
  • 31
  • 58
  • There's no way to answer this without knowing how large each binary event is. Creating a table alone doesn't affect memory of the brokers or topics, and they can/should be written to disk rather than maintained completely in memory – OneCricketeer Feb 23 '22 at 15:33
  • I mean things like replica.fetch.max.bytes wich is 1MB per default. There must be some kind of overhead known per topic / partition I guess, no? https://docs.confluent.io/platform/current/installation/configuration/broker-configs.html – benjist Feb 23 '22 at 16:07
  • There's network overhead with any consumer, not memory overhead – OneCricketeer Feb 23 '22 at 16:08
  • According to this site, there is a memory overhead per partition and topic: https://docs.cloudera.com/documentation/kafka/latest/topics/kafka_performance.html#concept_exp_hzk_br – benjist Feb 23 '22 at 16:15
  • That should be obvious. The brokers maintain and track the partitions. That has nothing to do with tables or topic consumers. – OneCricketeer Feb 23 '22 at 16:17
  • To clarify, `replica.fetch.max.bytes` is _between brokers_. Not for external clients – OneCricketeer Feb 23 '22 at 16:19
  • I may have been unprecise. I also mean the broker side, not just the consumer. So, that would translate to 1000 topics (each with a single partition) = 1 GB on the broker side. – benjist Feb 23 '22 at 16:33
  • A topic alone doesn't take 1MB of heap. Otherwise, the cluster's I've seen with several thousands of topics and only 6GB heap size wouldn't be running. The `fetch.max.bytes` are exclusively network buffer sizes, not statically allocated server heap spaces – OneCricketeer Feb 23 '22 at 16:41
  • I've added a small example – benjist Feb 23 '22 at 16:51
  • I dont really have a specific answer. Sticking with the comment that tables don't cause significant strain on any topic; at least, none beyond a regular consumer. Overall, I don't think Kafka can support a million topics [without KRaft mode](https://stackoverflow.com/a/32963227/2308683), which is not production ready yet. – OneCricketeer Feb 23 '22 at 19:44

0 Answers0