0

Consider this event loop:

  1. While more messages
  2. msg_in = consumer.Poll()
  3. msg_out = transform(msg_in)
  4. service.Publish(msg_out)

Assume one partition and focus on line four. When this loop crashes say there were 5 messages enroute to Kafka sent in order 1,2,3,4,5. And of these Kafka only got N<=5. Rest are lost.

If there are no retries, what can we say? That Kafka got 1 or 1,2 or 1,2,3 or 1,2,3,4 or 1,2,3,4,5? Kafka does guarantee per partition ordering.

If there are retries of course ordering is lost and Kafka may get any permutation P over what it did get take m from m=0 to N. That's understandable.

I'm using a golang wrap of rdkafka from confluent, but let's just focus on rdkafka itself.

ecwdw 23e3e23e
  • 375
  • 4
  • 11

1 Answers1

0

its even more complicated than you think :-)

librdkafka supports max.in.flight.requests.per.connection. if you set that to 1 retries should be safe to enable under some circumstances (namely, if they are infinite. any setting that discards undelivered data may reorder).

on newer versions enable.idempotence will improve this guarantee up to max in-flight of 5.

another "interesting" scenario could be that records 1, 2, 3, 4 and delivered, leader broker crashes, an unclean leader is appointed, record 4 is dropped, and then 5 is delivered, resulting in 1, 2, 3, 5 in the partition.

or maybe the topic in question is log-compacted and some of these records have the exact same key?

radai
  • 23,949
  • 10
  • 71
  • 115