Spring-Kafka Concurrency Property

Question

I am progressing on writing my first Kafka Consumer by using Spring-Kafka. Had a look at the different options provided by framework, and have few doubts on the same. Can someone please clarify below if you have already worked on it.

Question - 1 : As per Spring-Kafka documentation, there are 2 ways to implement Kafka-Consumer; "You can receive messages by configuring a MessageListenerContainer and providing a message listener or by using the @KafkaListener annotation". Can someone tell when should I choose one option over another ?

Question - 2 : I have chosen KafkaListener approach for writing my application. For this I need to initialize a container factory instance and inside container factory there is option to control concurrency. Just want to double check if my understanding about concurrency is correct or not.

Suppose, I have a topic name MyTopic which has 4 partitions in it. And to consume messages from MyTopic, I've started 2 instances of my application and these instances are started by setting concurrency as 2. So, Ideally as per kafka assignment strategy, 2 partitions should go to consumer1 and 2 other partitions should go to consumer2. Since the concurrency is set as 2, does each of the consumer will start 2 threads, and will consume data from the topics in parallel ? Also should we consider anything if we are consuming in parallel.

Question 3 - I have chosen manual ack mode, and not managing the offsets externally (not persisting it to any database/filesystem). So should I need to write custom code to handle rebalance, or framework will manage it automatically ? I think no as I am acknowledging only after processing all the records.

Question - 4 : Also, with Manual ACK mode, which Listener will give more performance? BATCH Message Listener or normal Message Listener. I guess if I use Normal Message listener, the offsets will be committed after processing each of the messages.

Pasted the code below for your reference.

Batch Acknowledgement Consumer:

    public void onMessage(List<ConsumerRecord<String, String>> records, Acknowledgment acknowledgment,
          Consumer<?, ?> consumer) {
      for (ConsumerRecord<String, String> record : records) {
          System.out.println("Record : " + record.value());
          // Process the message here..
          listener.addOffset(record.topic(), record.partition(), record.offset());
       }
       acknowledgment.acknowledge();
    }

Initialising container factory:

@Bean
public ConsumerFactory<String, String> consumerFactory() {
    return new DefaultKafkaConsumerFactory<String, String>(consumerConfigs());
}

@Bean
public Map<String, Object> consumerConfigs() {
    Map<String, Object> configs = new HashMap<String, Object>();
    configs.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootStrapServer);
    configs.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
    configs.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, enablAutoCommit);
    configs.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, maxPolInterval);
    configs.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetReset);
    configs.put(ConsumerConfig.CLIENT_ID_CONFIG, clientId);
    configs.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
    configs.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
    return configs;
}

@Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
    ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<String, String>();
    // Not sure about the impact of this property, so going with 1
    factory.setConcurrency(2);
    factory.setBatchListener(true);
    factory.getContainerProperties().setAckMode(AckMode.MANUAL);
    factory.getContainerProperties().setConsumerRebalanceListener(RebalanceListener.getInstance());
    factory.setConsumerFactory(consumerFactory());
    factory.getContainerProperties().setMessageListener(new BatchAckConsumer());
    return factory;
}

score 15 · Accepted Answer · answered Apr 11 '19 at 13:33

15

@KafkaListener is a message-driven "POJO" it adds stuff like payload conversion, argument matching, etc. If you implement MessageListener you can only get the raw ConsumerRecord from Kafka. See @KafkaListener Annotation.
Yes, the concurrency represents the number of threads; each thread creates a Consumer; they run in parallel; in your example, each would get 2 partitions.

Also should we consider anything if we are consuming in parallel.

Your listener must be thread-safe (no shared state or any such state needs to be protected by locks.

It's not clear what you mean by "handle rebalance events". When a rebalance occurs, the framework will commit any pending offsets.
It doesn't make a difference; message listener Vs. batch listener is just a preference. Even with a message listener, with MANUAL ackmode, the offsets are committed when all the results from the poll have been processed. With MANUAL_IMMEDIATE mode, the offsets are committed one-by-one.

answered Apr 11 '19 at 13:33

Gary Russell

166,535
14
146
179

Thanks again Gary! Regarding question#2 - My question was bit different. What if I have 2 instances of the same application (with same group id) ? Will each instance starts 2 threads and will consume in parallel (from each of the partitions)? Regarding question#3 - Yes, I mean rebalancing. But if I am using Batch processing mode and acknowledging only after all the records are processed, how come framework knows which offset to commit during a rebalance ? – Akhil Apr 11 '19 at 14:27
If you have 2 instances each with concurrency 2, each thread will get one partition. It depends on the reason for a rebalance. If it's because another group member joined, it won't happen until your listener exits (and commits the offsets). If it happens because your listener exceeded the `max.poll.interval.ms` then none of the offsets will be committed and that batch will be replayed to whatever consumer(s) get the partitions after the rebalance. – Gary Russell Apr 11 '19 at 14:34
Great! So to have exactly one processing, I would need to manually manage the offset and commit the offsets inside a rebalance listener's partition revoke method. Correct ? – Akhil Apr 11 '19 at 16:31
As I said, it depends on what caused the rebalance. – Gary Russell Apr 11 '19 at 16:43
@GaryRussell sorry for asking a question to an old topic, but in the same topic, if I want to get acknowledgment for each record, while I am listening to a batch of records using @ KafkaListener annotation on a method that has a list> as a parameter, I am supposed to use "MANUAL_IMMEDIATE mode" ? right ? and in side my foreach loop, I send an ack for each record ? "acknowledgment.acknowledge()" right ? – Ahmed Abdelhak Apr 06 '21 at 23:04
1

There is currently no support for comitting individual offsets with a batch listener. The `Acknowledgment` argument passed to a batch listener will commit the offsets for the entire batch. – Gary Russell Apr 06 '21 at 23:11
1

You can, however, add the `Consumer` as a parameter and commit the offset yourself. – Gary Russell Apr 06 '21 at 23:15
Thanks a lot for answering my question, and any recommendations on how I can handle failures if happens with individual records? I am calling an external service inside my loop for each message, as I need some data from that service for each message, so what if a connection failure happens in between, or whatever, any Ideas ? I tried to create a coroutine for each message, but it doesn't work since the consumer is not thread safe, also I will end up with socket exceptions since the batch holds thousands of records ( a stupid trial from me :D ) – Ahmed Abdelhak Apr 06 '21 at 23:18
1

You can call `nack()` in the `Acknowledgment`. The container will commit the offsets up to the index, perform seeks and redeliver the failed record. Or you can use the `RecoveringBatchErrorHandler`. See https://docs.spring.io/spring-kafka/docs/current/reference/html/#recovering-batch-eh – Gary Russell Apr 06 '21 at 23:28
Thank you very much @GaryRussell , it's a kind of you to help me <3 – Ahmed Abdelhak Apr 06 '21 at 23:32

score 2 · Answer 2 · answered Apr 11 '19 at 16:45

Q1:

From the documentation,

The @KafkaListener annotation is used to designate a bean method as a listener for a listener container. The bean is wrapped in a MessagingMessageListenerAdapter configured with various features, such as converters to convert the data, if necessary, to match the method parameters.

You can configure most attributes on the annotation with SpEL by using "#{…} or property placeholders (${…}). See the Javadoc for more information."

This approach can be useful for simple POJO listeners and you do not need to implement any interfaces. You are also enabled to listen on any topics and partitions in a declarative way using the annotations. You can also potentially return the value you received whereas in case of MessageListener, you are bound by the signature of the interface.

Q2:

Ideally yes. If you have multiple topics to consume from, it gets more complicated though. Kafka by default uses RangeAssignor which has its own behaviour (you can change this -- see more details under).

Q3:

If your consumer dies, there will be rebalancing. If you acknowledge manually and your consumer dies before committing offsets, you do not need to do anything, Kafka handles that. But you could end up with some duplicate messages (at-least once)

Q4:

It depends what you mean by "performance". If you meant latency, then consuming each record as fast as possible will be the way to go. If you want to achieve high throughput, then batch consumption is more efficient.

I had written some samples using Spring kafka and various listeners - check out this repo

Spring-Kafka Concurrency Property

2 Answers2

Linked