8

I am seeing spring Kafka code and I have some doubts:

  1. If we are using 1 @kafkaListener with 2 topics then spring Kafka creates a single MessageListenerContainer. And if I use separate @kafkaListener for each topic then 2 MessageListenerContainer will be created.

  2. Does MessageListenerContainer mean consumer?

  3. If I give concurrency as 4 in ConcurrentKafkaListenerContainerFactory then that means for every kafkaListener I open 4 threads with broker? That means coordinater sees them as 4 different consumer.

  4. How polling works with kafkaListener? Does it get only 1 ConsumerRecord from broker every time?

Please help.

Gary Russell
  • 166,535
  • 14
  • 146
  • 179
AlwaysLearning
  • 133
  • 1
  • 8

1 Answers1

8

There are two implementations of MessageListenerContainer - the KafkaMessageListenerContainer (KMLC) and ConcurrentMessageListenerContainer (CMLC).

The CMLC is simply a wrapper for one or more KMLCs, with the number of KMLCs specified by the concurrency.

@KafkaListener always uses a CMLC.

Each KMLC gets one Consumer (and one thread). The thread continually poll()s the consumer, with the specified pollTimeout.

How the topics/partitions are distributed across the KMLCs depends on

  • how many partitions the topic(s) have
  • the consumer's partition.assignment.strategy property

If you have multiple topics with fewer partitions than the concurrency, you will likely need an alternate partition assignor, such as the round robin assignor, otherwise you will have idle containers with no assignment.

  1. That is correct; if you explicitly want a different container for each topic, you can provide multiple @KafkaListener annotations on the same method.
  2. See my explanation above.
  3. That is correct - it's the only way to get concurrency with Kafka (without adding very complicated logic to manage offsets).
  4. The number of records returned by each poll depends on a number of consumer properties, max.poll.records, fetch.min.bytes, fetch.max.wait.ms.
Gary Russell
  • 166,535
  • 14
  • 146
  • 179
  • Thanks @Gary. Now in Kafka client API, a consumer is client that consumes records from Kafka cluster. And if I am using spring Kafka, and I have 2 KMLCs that means there are 2 consumers connected to cluster and each KMLC is connected to different client. – AlwaysLearning Feb 17 '19 at 20:21
  • Also, suppose I have a topic A with 10 partitions and my concurrency is 5. So that will create 5 consumer processes, each will consume from 2 different partition. Now if I started the same application with same conf. on another machine, so to Kafka coordinator it would look like 10 different consumers so now all the consumers will get 1 partition each and in that case, one consumer machine is getting 5 partitions. Now if I increase this concurrency to 10, would there be possibility that one of the consumer machine get no partitions assigned and would be idle. – AlwaysLearning Feb 17 '19 at 20:31
  • 1
    That is correct. If the consumers have the same `group.id` property, the partitions will be distributed between them. If they have different `group.id`s, they will both receive all records. The CLMC's child KMLCs all get the same `group.id`. Yes; you must have the number of partitions >= `number of instances * concurrency` to avoid idle consumers. – Gary Russell Feb 17 '19 at 20:33
  • Then how do you handle the concurrency, if I am subscribing to topic which is having less no of partition than the concurrency, since at the time of consumer creation, you are not aware of partitions if I am subscribing to topic. I have seen that in case of TopicPartition, the concurrency gets reduced to the length of topicPartition. So if I have a topic A with 10 partitions and my concurrency is 11, then what would happen to that 11th consumer thread. Would that send any heartbeat to broker, or cause rebalance after session.timeout.ms? – AlwaysLearning Feb 17 '19 at 20:42
  • The idle consumer continues to poll the consumer in its loop. It will just never get any records. That way, if a rebalance occurs, it can participate (partition assignment occurs within the `poll()` sequence). Say you have 10 partitions, 2 instances with 6 concurrency. When both instances are active, you will have 2 idle consumers. If one instance is stopped the 10 partitions will be distributed across the remaining 6 consumers (including the one that was idle). Also, if both apps are running and you increase the partitions on the broker, the idle consumers will participate in the rebalance. – Gary Russell Feb 17 '19 at 20:57
  • Thanks for clearing the doubt. You have have mentioned idle 'consumer' continues to poll the 'consumer'. What does that mean? Also, in your answer you have mentioned "Each KMLC gets one Consumer (and one thread). The thread continually poll()s the consumer, with the specified pollTimeout." so is it like the KMLC thread polls the consumer and then the consumer polls the broker, gets max.poll.records and gives it to KMLC which in its loop gets one record, deserialize it and passes it to kafkaListener function. – AlwaysLearning Feb 17 '19 at 21:19
  • And if function is taking long time and passed the poll time interval for all the records fetched combined then would the KMLC thread call the poll on the consumer after poll timeout or wait for the all the records to be processed and then call the poll. Also if the function completed before the poll timeout the would the KMLC thread be idle and call the poll after the timeout is reached. – AlwaysLearning Feb 17 '19 at 21:19
  • Also, can you please explain this "That way, if a rebalance occurs, it can participate (partition assignment occurs within the poll() sequence)." – AlwaysLearning Feb 17 '19 at 21:21
  • StackOverflow is not really suitable for "chatting". You should experiment and ask a new question if there is something you don't understand. Also look at the code. The thread calls `poll()` in a loop. Everything happens on the same thread. `poll, call listener with each record, poll again, ...`; the thread is either in poll or calling the listener. If the listener is too slow, a rebalance will be detected on the next `poll()`. All activity occurs there. Since the idle thread still calls `poll()` but gets no records, it is a candidate for partition distribution whenever a rebalance occurs. – Gary Russell Feb 17 '19 at 21:29
  • Hi @AlwaysLearning, Above discussion was very nice. I'm basically trying to understand the MessageListenerContainer.pause/resume/stop/start methods. But whereas in KafkaCOnsumer we just have the pause/resume on TapicPartition. Could you provide the understanding on listerner.stop method? – Vish Oct 16 '22 at 14:09
  • Don’t ask new questions in comments. stop() closes the consumer and stops the thread that polls it. – Gary Russell Oct 17 '22 at 12:35