1

I have a usecase where we create multiple streams from the Kafka DStream. I would like to commit the offsets only after processing both the streams successfully. Is this possible?

Current strategy:

1) create dstream one.
2) create dstream two.
3) process two streams in parallel by creating threads.
4) wait for all therads to complete using countdown latch.
5) finally commit all offsets.

But in the above strategy, one problem is how to track the offsets for records which got failed to get processed completely.

JavaInputDStream<ConsumerRecord<String, String>> telemetryStream = KafkaUtils.createDirectStream(
                streamingContext, LocationStrategies.PreferConsistent(),
                ConsumerStrategies.Subscribe(topics, kafkaParams));

JavaDStream<String> telemetryDStream = telemetryStream.map(record -> {
    return record.value();
});

telemetryDStream.cache();

CountDownLatch latch = new CountDownLatch(2);
Thread t1 = new Thread(new Runnable() {
        @Override
        public void run() {
            try {
                //processing logic here
            } finally {
                latch.countDown();
            }
    }
});

t1.start();

Thread t2 = new Thread(new Runnable() {
    @Override
    public void run() {
        try {
            //processing logic here
        } finally {
                    latch.countDown();
            }
        }
});

t2.start();

latch.await();

//now commit offsets here

Is there a better way of handling this.

wandermonk
  • 6,856
  • 6
  • 43
  • 93

0 Answers0