1

I came across with this scenario when implementing a chained transaction manager inside our spring boot application interacting with consuming messages from JMS then publishing to a Kafka topic. My testing strategy was explained on here: Unable to synchronise Kafka and MQ transactions usingChainedKafkaTransaction

In short I threw a RuntimeException on purpose after consuming messages from MQ and writing them to Kafka just to test transaction behaviour.

However as the rollback functionality worked OK I could see the number of uncommitted messages in the Kafka topic growing forever even if a rollback was happening with each processing. In a few seconds I ended up with hundreds of uncommitted messages in the topic.

Naturally I asked myself if a message is rollbacked why would it still be there taking storage. I understand with transaction isolation set to read_committed they will never get consumed but the idea of a poison message being rollbacked again and again eating up your storage does not sound right to me.

So my question is: Am I missing something? Is there a configuration in place for a "time to live" or similar for a message that was rollbacked. I tried to read the Kafka docs around this subject but I could not find anything. Is such a setting is not in place what would be a good practice to deal with situations like this and avoid wasting storage.

Thank you in advance for your inputs.

Julian
  • 3,678
  • 7
  • 40
  • 72

1 Answers1

0

That's just the way Kafka works.

Publishing a record always takes a slot in the partition log. Whether or not a consumer can see that record depends on whether it is committed or not (assuming the isolation level is read_committed).

Kafka achieves its extraordinary throughput because of its simple log architecture.

Rollback is assumed to be somewhat rare.

If you are getting so many rollbacks then your application architecture is probably at fault.

You should probably shut things down for a while if you keep rolling back.

To specifically answer your question, see log-rentention-hours.

The uncommitted records are kept for a week by default.

Gary Russell
  • 166,535
  • 14
  • 146
  • 179
  • Not what I wanted to hear but it just confirmed what I suspected. Rollbacked messages are kept in the logs as any other message. I agree the architecture should deal with that but still think an uncommitted message should be kept in transaction logs rather than in the persistent log. Even with this architecture using persistent log for uncommitted data which in my opinion is bad I would still add support for something like `rollbacked.cleanup.interval`. On top of that given my producer was idempotent I would have expected just one message and not hips of them. But your answer is valid. – Julian Jun 02 '20 at 05:49