2

So, as far as I understand from Transactions in Apache Kafka, a read_committed consumer will not return the messages which are part of an ongoing transaction. So, I guess, the consumer will have the option to commit its offset past those ongoing-transaction messages (e.g. to read non-transactional messages) or the option to make no further advancement until the encountered transaction is committed/aborted. I just suppose it will be allowed (by Kafka) to skip those pending-transaction records, but then how the consumer will read them when committed considering that its offset might already be far away?

UPDATE

Consider that the topic could have a mix of records (aka messages) coming from non-transactional producers and transactional ones. E.g., consider this partition from a topic:

non-transact-Xmsg, from-transact-producer1-msg, from-transact-producer2-msg, non-transact-Ymsg

If the consumer encounters from-transact-producer1-msg will he skip the message then read non-transact-Ymsg or will he just hang before the not yet committed from-transact-producer1-msg and by doing so it won't read non-transact-Ymsg?

Consider also that could be many transactional producers and so many equivalents of from-transact-producer1-msg, some committed some not. So, from-transact-producer2-msg could be a committed one at the point when the consumer reached non-transact-Xmsg.

Adrian
  • 3,321
  • 2
  • 29
  • 46
  • In the same Confluent blog it is stated _"...the consumer does not need to [sic] any buffering to wait for transactions to complete. Instead, **the broker does not allow it to advance to offsets which include open transactions.**"_ – mazaneicha Feb 10 '20 at 15:38
  • And that part puzzles me because it doesn't say that the consumer is not allowed to skip *pending transaction records* in order to read e.g. *non-transaction records*. But then what will happen when the *pending transaction records* become committed? Will they remain not read because were skipped when they were *pending*? Will the consumer go somehow automatically back? None of these make sense. – Adrian Feb 10 '20 at 22:21

2 Answers2

2

From docs about isolation.level:

Messages will always be returned in offset order. Hence, in read_committed mode, consumer.poll() will only return messages up to the last stable offset (LSO), which is the one less than the offset of the first open transaction. In particular any messages appearing after messages belonging to ongoing transactions will be withheld until the relevant transaction has been completed. As a result, read_committed consumers will not be able to read up to the high watermark when there are in flight transactions.

Iskuskov Alexander
  • 4,077
  • 3
  • 23
  • 38
  • Please see the UPDATE which highlights the possibility of having a mix of transaction-based or non-transaction-based records (aka messages) in a partition from a topic plus many transactional producers with their records also mixed. – Adrian Feb 10 '20 at 15:20
0

Your requirement is not 100% clear, but if I get it right, you want to be able to re-process some consumed messages that for some reason you could not successfully process when consuming them for the first time. And- you don't want to get "stuck" on those messages, you prefer moving on and handle those later. In that case, the best option would probably be writing them to a different queue, and have another consumer read those "failed" messages and retry as much as you'd like.

Lior Chaga
  • 1,424
  • 2
  • 21
  • 35
  • Thanks but the question is about kafka transactions and the consumer filtering uncommitted ones while still advancing. At some point the consumer should return the committed transaction too but how? he already advanced (its kafka offset) past the now committed transactions. – Adrian Feb 10 '20 at 11:54