0

I need to consume records from Kafka partition in multiple Thread with unique records on each thread to process. I have following code, I don't know what was the mistake

public class ConsumerThread implements Runnable {
    public String name;
    public ConsumerThread(String name){
        this.name = name;
    }
    public Properties getDefaultProperty(){
        Properties prop = new Properties();
        prop.setProperty("group.id", "4");
        prop.put("enable.auto.commit", "false");
        prop.put("auto.offset.reset", "earliest");
        prop.setProperty("bootstrap.servers", "localhost:9092");
        prop.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        prop.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        prop.setProperty("max.poll.records","150");
        return prop;
    }
    public void run() {
        TopicPartition tp = new TopicPartition("my.topic", 0);
        KafkaConsumer consumer = new KafkaConsumer(getDefaultProperty());
        ArrayList tpList = new ArrayList<TopicPartition>();
        tpList.add(tp);
        consumer.assign(tpList);
        ConsumerRecords poll = consumer.poll(1000);
        Iterator it = poll.iterator();
        consumer.commitAsync();
        while(it.hasNext()){
            ConsumerRecord cr = (ConsumerRecord) it.next();
            System.out.println("From "+this.name+" : "+cr.value());
        }
        consumer.close();
        System.out.println("Thread Exiting "+this.name);
    }
}

Result

From Thread1 : produced_0
From Thread1 : produced_1
From Thread1 : produced_2
From Thread1 : produced_3
.
.
.
From Thread1 : produced_136
From Thread2 : produced_0
From Thread2 : produced_1
From Thread2 : produced_2
From Thread2 : produced_3
.
.
.


Expected :

From Thread1 : produced_0
From Thread1 : produced_1
From Thread1 : produced_2
From Thread1 : produced_3
.
.
.
From Thread1 : produced_136
From Thread2 : produced_4
From Thread2 : produced_5
From Thread2 : produced_6
From Thread2 : produced_137
Ravi
  • 30,829
  • 42
  • 119
  • 173
  • Looks like you have multiple threads subscribed to only partition 0 with no guarantee of thread consumption order – OneCricketeer Oct 14 '18 at 17:25
  • If you expect to be able to consume the same records from the same topics this should be two separate groups and have different `group.id` assignments. If this is not what you're trying to accomplish, please provide more information. – Chris Matta Oct 14 '18 at 18:29

2 Answers2

0

Auto assignment of partitions to a consumer group is only feasible with the subscribe method of the kafka consumer. You, however, using assign with specific topic partition, so you assume responsibility for assigning specific partitions to different consumers (but you always using the same partition 0, so all consumers are consuming from same topic partition).

Lior Chaga
  • 1,424
  • 2
  • 21
  • 35
  • used subscribe but it is getting synchronized somehow thread1 receive all messages or thread2 receive all message i want both thread to pick up unique messages not same message. ``` public void run() { ArrayList ar = new ArrayList(); KafkaConsumer consumer = new KafkaConsumer(getDefaultProperty()); ar.add("my.topic"); consumer.subscribe(ar); ConsumerRecords poll = consumer.poll(1000); ``` – Vigneshwaran Oct 15 '18 at 14:27
  • That is not possible. Assuming consumer in both threads are using the same `group.id`, and you are committing within `max.poll.interval.ms` (https://kafka.apache.org/documentation/#newconsumerconfigs) which will lead to re-joins during consumption. Depending on the version you're using, tiny chance you are affected by https://issues.apache.org/jira/browse/KAFKA-5430, but this scenario happens once in a blue moon, and is not easily reproducible. – Lior Chaga Oct 17 '18 at 08:24
0

Like Lior Chaga said in his comment, you're manually assigning topic-partitions to your consumer. That's not the recommended way to do this. On top of that, it seems all your consumers are using the same exact groupID. With this configuration, with two threads consuming, if at least one of the consumers got a particular message, none of the other threads will get that one. If you want all of the consumer threads to each get their own "set" of messages, without interrupting each other, then you need to give them different group.ids.

To subscribe to a topic so it will handle auto-rebalancing for you, and then consume, you should do something like this (taken from the KafkaConsumer javadoc linked below):

 consumer.subscribe(Arrays.asList("foo", "bar"));
 while (true) {
     ConsumerRecords<String, String> records = consumer.poll(100);
     for (ConsumerRecord<String, String> record : records)
         System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
 }

The official Kafka javadocs have much more detailed explanations: https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

mjuarez
  • 16,372
  • 11
  • 56
  • 73
  • as you mentioned. I don't want the same message to be consumed by other thread. subscribing to that topic with the same group.id will give unique messages to each thread right? – Vigneshwaran Oct 15 '18 at 14:35