2

I'm new to Kafka, I have been studying the behaviour of the Kafka when sending messages to it while it is stopped.

The scenario that I face is that I stop the Kafka using 'Kubectl delete StatefulSet kafka_kf'. Then I send a number of messages to the Kafka using java Kafka Producer. Then I start the Kafka again, these messages that were sent to Kafka do appear immediately in the consumer at the moment when I start the Kafka. Any idea what happens within the Kafka in this case? and how to prevent these messages from appearing in the consumer? These messages cause a duplication issue later, that's why I need them to not appear.

I see the messages appear in the consumer by using the consumer opened with the command:

kubectl exec -ti test -- ./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --isolation-level read_committed --topic testtopic

The peace of code that is used to send messages to kafka is: producer.send(message)

Michael Heil
  • 16,250
  • 3
  • 42
  • 77
Omar Smri
  • 195
  • 1
  • 7

1 Answers1

1

First, I think it is important to understand that producer.send() is an asynchronous call, so it does not block. Second, the send() method does not actually push the message to the brokers but instead places the message in a binary queue in local memory. There is a separate binary queue for each partition in the topics that the producer communicates with. The records are actually pushed to the brokers by an internal background thread on the producer side that will be triggered by configurable batching thresholds. It is this action that is waiting for the acks from the brokers (as configured by the acks setting), not the send() method.

[Source: Confluent Training - Developer Skills for Building Apache Kafka]

When Kafka is not available you will get a TimeoutException in your producer. However, this Exception can be handled by a retry and the producer configuration retries is by default set to 2147483647.

As soon as you make Kafka available, your producer is then able to actually send the messages to Kafka and your Consumer will receive them.

If you do not want to receive those messages you need to set the KafkaProducer configuration retries=0.

To understand more on the Producer Callback Exceptions, you could look into another answer of mine.

Edit for new question in comment:

Is there any way to find whether a message (or all the messages) was successfully sent or not?

You can define a custom Callback class like below when sending the data. This callback will throw an Exception if something went wrong with the producing of the messages.

class ProducerCallback extends Callback {

  @Override
  override def onCompletion(recordMetadata: RecordMetadata, e: Exception): Unit = {
    if (e != null) {
      e.printStackTrace()
    }
  }

}

producer.send(message, new ProducerCallback)

As an alternative you could simply call

producer.send(message).get()

as this will block until you have received all acknowledgments from Kafka broker (see KafkaProducer configuration acks).

Michael Heil
  • 16,250
  • 3
  • 42
  • 77
  • Hi Mike, thanks for your help, Is there any way to find whether a message (or all the messages) was successfully sent or not? So I can proceed with the code and based on the result of `producer.send(message)` Thank you – Omar Smri Sep 09 '20 at 11:30
  • Thank you, done. Can you afford some help here: https://stackoverflow.com/questions/63829279/how-to-check-if-there-are-transactions-in-flight-in-kafka-and-how-to-clear-them ? – Omar Smri Sep 10 '20 at 12:04