Operating system Linux 7.9. librdkafka(v1.6) Producer configs
enable.idempotence=true #we need orderly msgs without gaps to be written in kafka
message.timeout.ms=30000 #a lower value(30 seconds) set for testing purpose
enable.gapless.guarantee=true
Kafka version=kafka_2.12-2.7.0 and it has 3 brokers. Topic Replication factor is 2.
Please assume below test scenario,
- Producer produce msgs A, B, C(in given order) to same topic partition.
- One of the Kafka broker processes(which contains above topic partition) is frozen(in bash: kill -s stop "pid of kafka broker process")
- Producer produce msgs (i.e. RdKafka::Producer::produce) D, E, F
- Since message.timeout.ms was set to 30seconds, librdkafka gives a delivery failed error after 30 seconds.
- Fatal Error handling step(pls refer below)
- Above frozen Broker process is released (kill -s cont)
I tried different below Fatal Error handling steps,
- Call RdKafka::Producer::flush with a 5min timeout and delete RdKafka::Producer object (re-created later)
- Call RdKafka::Producer::purge(PURGE_QUEUE) and delete RdKafka::Producer object (re-created later)
- Killing the C++ (my)process with kill(getpid(), SIGKILL)/exit()
but all of them result in a condition where some of the msgs in mid of above #3(produce while Broker frozen) are NOT written to Kafka.
e.g.
msgs D and F written, BUT msg E is NOT written.
How can I resolve this ? i.e. avoid missing msgs in mid. (then I can re-start writing from the last written msg onwards)
My attempt is to handle Kafka FATAL errors (Delivery failures/produce failures(other than QUEUE_FULL)/error callback fatal error) gracefully.