How to manage threads and memory of librdkafka consumer?

Question

I tried using the librdkafka C++ library. I noticed that 4 new threads are spawned for each new topic my consumer (RdKafka::KafkaConsumer) subscribes to. Approximately 30 MB of virtual memory is also used for every topic subscribed to.

My client application/consumer needs to consume from about 2000 topics. These would then translate to my application using about 8000 threads and 60 GB of virtual memory. Assuming that I need around 20 partitions to achieve my desired throughput, I would need around 20 instances of my application. If all application instances are housed in a single server, then the server would need to be able to run at least 8000 x 20 = 160,000 threads simultaneously and use 60 x 20 = 1.2 TB of virtual memory.

160,000 threads and 1.2 TB of virtual memory is very overwhelming for a single server. So, multiple servers may be used to house the instances to distribute the load. Still, the divided numbers are still quite mind-boggling.

Is there a way to somehow control the amount of threads and memory of the client application when using the librdkafka library?

score 1 · Accepted Answer · answered Mar 01 '22 at 21:51

1

A single consumer can consume from any number (well, reasonable numbers) of topics/partitions, you shouldn't create a separate consumer for each topic.

Also see https://github.com/edenhill/librdkafka/wiki/FAQ#number-of-internal-threads

answered Mar 01 '22 at 21:51

Edenhill

2,897
22
35

Thank you, @Edenhill. But if I use a single `KafkaConsumer` instance for my client application to consume from 2000 topics, then how do I selectively consume from a specific topic? `KafkaConsumer::consume()` does not provide an argument to specify a topic. It would also be cumbersome to call `KafkaConsumer::unassign()` and `KafkaConsumer::assign()` each time I would need to consume from a specific topic. – hermit.crab Mar 02 '22 at 01:52
1

Each partition has its own fetch queue which by default is forwarded to a single shared queue. You can remove this queue forwarding so that you can consume_queue() each partition individually. Have a look at https://docs.confluent.io/platform/current/clients/librdkafka/html/classRdKafka_1_1Queue.html#a49827afcb8804719ffc23d120915b371 – Edenhill Mar 03 '22 at 12:31

How to manage threads and memory of librdkafka consumer?

1 Answers1