6

I am writing a kafka consumer using @KafkaListener annotation and i got to know that there is a way we can increase the number of concurrent kafka consumers from different partition using a method in ConcurrentKafkaListenerContainerFactory

e.g. factory.setConcurrency(3);

Javadoc for setconcurrency says like this:-

The maximum number of concurrent KafkaMessageListenerContainer running. Messages from within the same partition will be processed sequentially.

Now my question is

I have a kafka topic with 144 partitions to which our application needs to consume the message and 3 instance of app is running in parallel.

I want to know how to decide the concurrency value needs to bet set in

ConcurrentKafkaListenerContainerFactory.setconcurrency (<Value>) 

so that we can achieve high throughput in consuming the message.

should i use 144/3 = 48 as concurrency factor or is there a formula to derive this number ?

Ryuzaki L
  • 37,302
  • 12
  • 68
  • 98
Neer1009
  • 304
  • 1
  • 5
  • 18

1 Answers1

3

Yes the best you have is setting concurrency to 48 in each instance so that each partition will be consumed from unique thread in consumer group, And also to achieve high throughput you can use Batch listeners with higher batch size

The another best option is having more instance running for example 14 and each having concurrency level of 10. In both the approaches you also need to consider the available CPU for each instance having over head threads than CPU will not give better performance

Starting with version 1.1, you can configure @KafkaListener methods to receive the entire batch of consumer records received from the consumer poll. To configure the listener container factory to create batch listeners, you can set the batchListener property

Ryuzaki L
  • 37,302
  • 12
  • 68
  • 98
  • Bear in mind you don't *have* to have one consumer per partition; how much concurrency you need depends on many factors including, but not limited to, your code, any synchronized blocks, any downstream bottlenecks (DB, network, etc, etc). – Gary Russell Feb 01 '20 at 14:24
  • @GaryRussell any recommendation from you regarding my problem statement as you have developed the spring kafka libraries ? – Neer1009 Feb 01 '20 at 14:26
  • @GaryRussell :- if suppose(trying to understand the concurrency) i keep concurrency as 1 and then 3 instances has to connect to all 142 partitions. it means that per instance will be connected to 42 partitions but only one message at a time will be consumed by the app thread as concurrency is 1. After that one processing is done , then next partition message will be picked up out of 42 partitions and consume the message. Is my understanding correct ? – Neer1009 Feb 01 '20 at 14:50
  • It's impossible to provide generic guidance - each environment is different; you need to experiment to determine the best settings for your situation. If you can't get your desired throughput you will need to profile your application to figure out where the bottlenecks are. Yes, if you have concurrency of 1, you will only get one record at time; you may get several records for one partition before you get any from another partition. – Gary Russell Feb 01 '20 at 14:51
  • @Deadpool :- Using batch listener with higher batch size something like 5000 won't increase the heap size of application as i am consuming all records and putting into in-memory before i acknowledge them back. correct me if my understanding is not right ? – Neer1009 Feb 02 '20 at 06:10