4

I am writing a kafka cosumer. The consumer's job is primarily about creating multiple db entities and save them after processing the payload. I am trying to code for handling errors that can occur while consuming the data. For this I can think of 2 options (In Spring eco-system)

  1. Send the failed message to a dead-letter-kafka-topic
  2. Send the failed message to a new DB table (Error-table)

The failed messages needs to be processed again.

In Case1: Again I have to write another @KafkaListner, which listens to dead-letter-topic and processes the message. Here the problem is I cannot have more control over how to initiate the re-processing flow. (Like a scheduler) Because KafkaListener will start processing the data as soon as the data is published in the dead letter topic.

In Case 2: I have more control over re-process flow as I can write a REST end point or Scheduler which will try to re-process the failed messages. (Here I have dilemma over which DB to use. Relational OR some key value store)

I am basically having a design dilemma and unable to determine which approach is better in Spring Eco-System.

Appreciate the response.

Heisenberg
  • 83
  • 1
  • 6

2 Answers2

4

I think using Kafka is the best solution.

Because KafkaListener will start processing the data as soon as the data is published in the dead letter topic.

You can control the behavior by setting autoStartup to false on that listener, then start/stop the listener using the KafkaListenerEndpointRegistry as needed:

registry.getListenerContainer(myListenerId).start();

Or you can use your own KafkaConsumer (created by the consumer factory) and poll for as many records as you want and close the consumer when you are done.

Gary Russell
  • 166,535
  • 14
  • 146
  • 179
  • Thanks a bunch!. I have this dilemma because in our enterprise, kafka message retention period is 72hrs max. The entire kafka platform is managed by different team. Hence the thought of saving the failed messages to DB came to my mind. – Heisenberg Oct 05 '19 at 14:55
2

I agree with the answer of Gary Russell, you can create KafkaConsumer instance and control its lifecycle. The class comes from org.apache.kafka:kafka-clients library.

In your particular case you may add Thread.sleep(schedulerDelay) to achieve scheduling. Here is the simplified example:

@Component
class Scheduler() {

  public void init() {
    // create kafka consumer connected to your DLQ topic
  }

  public void run() {
    try {
      while (running) {
        ConsumerRecords<String, String> records = consumer.poll(1000);
        for (ConsumerRecord<String, String> record : records)
          processRecordLogicGoesHere(record);
        Thread.sleep(schedulerDelay);
      }
    } finally {
      consumer.close();
    }
  }

}

The schedulerDelay should be picked up carefully to keep up with incoming messages and do not let them get lost by Kafka's log cleanup policy.

There are plenty of tutorials on how to work with Kafka's offical API, here is one of them: Introducing the Kafka Consumer

In addition, you may find some ideas here: Retrying consumer architecture in the Apache Kafka

Oiew
  • 1,509
  • 1
  • 11
  • 14
  • Conceptually, working with bare `KafkaConsumer` is implementation of `PollingConsumer` pattern. Spring's `KafkaListener` is implementation of `Event-Driven Consumer` pattern. These two come from Enterprise Integration Patterns book. – Oiew Oct 05 '19 at 13:43