We are facing a peculiar issue and seeing that when we produce messages to Kafka, it is sometimes not being found at the consumer end. We tried to debug this further and enabled the onSuccess() and onFailure() callbacks. We got that major issue was -
org.springframework.kafka.core.KafkaProducerException: Failed to send; nested exception is org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
To solve for this, we increased retries to 10 and it helped fix the issue almost completely.
However, we found 3 msgs(each at a different time) for which we neither had an onSuccess() or onFailure() callback. It just got lost in communication, so to say!
Now, this happened just before the application was taken down for redeployment. I understand from the Kafka Producer Config, the default batch size is 16KB and it waits for the batch to be filled before actually sending the message to the broker (I have deliberately taken out linger.ms consideration for simplicity).
My question is, can it happen that all the message in Kafka batch is getting lost when the system is forcefully being shutdown for deployment? If yes, how do we address this issue?
Please help me here as this is the issue that we are facing in production.
Many thanks in advance!