Is it necessary to use transactions explicitly in Kafka Streams to get "effectively once" behaviour?

Question

Stream processing applications written in the Kafka Streams library can turn on exactly-once semantics by simply making a single config change, to set the config named “processing.guarantee” to “exactly_once” (default value is “at_least_once”), with no code change required.

But as transactions are said to be used, I would like to know: Are transactions used implicitly by Kafka Streams, or do I have to use them explicitly?

In other words, do I have to call something like .beginTransaction() and .commitTransaction(), or is all of this really being taken care of under the hood, and all that remains for me to be done is fine-tuning commit.interval.ms and cache.max.bytes.buffering?

Michael Heil · Answer 1 · 2020-10-02T05:40:07.643

Kafka Streams is using the transactions API to achieve exactly-once semantics implicitly, so you do not need to set any other configuration.

If you continue reading the blog it says:

"More specifically, when processing.guarantee is configured to exactly_once, Kafka Streams sets the internal embedded producer client with a transaction id to enable the idempotence and transactional messaging features, and also sets its consumer client with the read-committed mode to only fetch messages from committed transactions from the upstream producers."

More details can be found in KIP-129: Streams Exactly-Once Semantics

Is it necessary to use transactions explicitly in Kafka Streams to get "effectively once" behaviour?

1 Answers1