we have one consumer group and three topics, all three topics are of different schema . created one consumer with a for loop passing each topic at a time and polling it processing and committing manually. Method used is consumer created common and in for loop I am subscribing one topic at a time and processing data. I am seeing a random lag of consumer , although the topic has data my consumer fetches no records from topic and fetches sometimes. When I work out with a single topic instead of looping through three topics it is working but unable to reproduce. need help to debug the issue and reproduce the same,
Asked
Active
Viewed 3,308 times
0
-
Could someone pls help, – sra1 Aug 27 '18 at 23:22
-
You can subscribe to a regex pattern. Why do you want to loop over topics? And if you are, you need to close the consumer after each loop, or use separate threads – OneCricketeer Aug 28 '18 at 02:38
-
Each topic has different schema and to be processed differently. If all three topics are subscribed using regex pattern, for any issue in any of the topic leads to exception and messages in other topics will not be committed. – sra1 Aug 29 '18 at 23:15
-
And if I had to close consumer after each loop how does it impact on performance. I tried with keeping the consumer alive but unsubscribing after each loop and subscribing to another topic in the loop (but still facing lag). – sra1 Aug 29 '18 at 23:18
-
I would again suggest using a thread for each topic. Don't loop anything other than the pill loop for a single topic. And you should be able to perform a check against each schema after its deserialized to tell what type of class it is. (depends if you're using Avro or JSON or other formats, though) – OneCricketeer Aug 30 '18 at 00:20
1 Answers
0
Rather than looping three topics in a single method, you could create a skeleton thread like so that consumes from any topic. See examples here
I can't say if this will "fix" the problem, but trying to consume from topics with different schemas in one application is usually not a scalable pattern, but it's not really clear what you're trying to do.
class ConsumerThread extends Thread {
KafkaConsumer consumer;
AtomicBoolean stopped = new AtomicBoolean();
ConsumerThread(Properties props, String subscribePattern) {
this.consumer = new KafkaConsumer...
this.consumer.subscribe(subscribePattern);
}
@Override
public void run() {
while (!this.stopped.get()) {
... records = this.consumer.poll(100);
for ( ... each record ... ) {
// Process record
}
}
}
public void stop() {
this.stopped.set(true);
}
}
Not meant to be production-grade
Then run three consumers independently.
new ConsumerThread("t1").start();
new ConsumerThread("t2").start();
new ConsumerThread("t3").start();
Note: KafkaConsumer
is not thread-safe.

OneCricketeer
- 179,855
- 19
- 132
- 245
-
Thanks for the sample, does the poll method for each thread considers a different action. We have to process JSON check validations and load into a database. If it is going to take a while which exceeds the max timeout we could not able to commit. In the scenario you mentioned, what happens when one thread fails for some error? will it closes the whole program. Instead of looping through each topic I am using the same code in topic specific folder and running using only one input topic name parameter (instead of three which is used earlier) . – sra1 Sep 16 '18 at 23:39
-
(cont..2 )So when we see the processes in the linux machine we can find three independent processes running on the same machine using same consumer group but different topics. How will this impact the performance . By using this I dont see any lag in the topics (I know this is best practice ). – sra1 Sep 16 '18 at 23:41
-
I'm not entirely sure what you mean, but threads stopping will not crash the entire app. Polling is required and won't block each other. You must use multiple threads (or processes, or instances) no matter what in order to parallelize a consumer group. Looping through a list of topics just does not make sense to me. It won't scale like you hope; run multiple instances of your applications and consume from individual topics. As I mentioned before, use the Kafka Streams or KSQL to perform joins across topics. – OneCricketeer Sep 17 '18 at 02:17