Kafka transactions and Producer buffering mechanism

Question

I need to get exactly-once semantics, so I use Kafka Transactional API. And I'm trying to understand how to work with Producer efficiently. As I read in some articles, it's more optimized way to use only one Producer per application instance (within one TCP connection) because of its buffering mechanism. On the other hand, when I call producer.commitTransaction() for a single message, it flushes message immediately without using message buffer. Do I need to implement buffer manually and call producer.commitTransaction() for buffered messages? Or is there another way to use buffering with transactions?

I know that in Spring producers are cached when transactions are enabled. But I don't use Spring and I'm not sure how Spring producers cache actually works. Maybe I should implement something similar and create new Producer if another is busy?

Example of my method:

public void produce(@NotNull T payload) {

        var key = UUID.randomUUID();
        var value = JsonUtils.toJson(payload);

        try {
            ProducerRecord<UUID, String> record = new ProducerRecord<>(topic, key, value);
            producer.beginTransaction();
            producer.send(record);
            producer.commitTransaction();
        } catch (ProducerFencedException e) {
            log.error("Producer with the same transactional id already exists", e);
            producer = KafkaProducerFactory.getInstance().recreateProducer();
        } catch (KafkaException e) {
            log.error("Failed to produce to kafka", e);
            producer.abortTransaction();
        }

        log.info("Message with key {} produced to topic {}", key, topic);
    }

can you share- how kafkaTemplate bean configured? check if autoflush is true? — kus, May 10 '22 at 16:11
I don't use Spring and KafkaTemplate. I use KafkaProducer and call KafkaProducer.commitTransaction(). Docs of this method say: "this method will flush any unsent records before actually committing the transaction". — Singularis24, May 10 '22 at 16:52

score 1 · Answer 1 · answered May 10 '22 at 18:56

Let's begin with Non-transactional kafka Producer, there are set of configurable properties that control the buffering mechanism:

batch.size
linger.ms
buffer.memory

Basically Kafka internally batch as per configuration. If linger.ms=0, producer will always send immediately even if batch is not full, non zero value will wait for define number of time if batch is not full.

When it come to Transactional Producer, there are some differences:

commitTransaction() will immediately send the message from transaction, doesn't wait for batch size to be full fill. This is the reason one message is sent immediately in above example.
If there are multiple producer.send() in transaction boundary, all will be part of single transaction. This will not be true for non-transactional because of batch and other configuration.

When commitTransaction() is called, it basically wakeup the thread to send the messages.

Thanks. Yes, I understand it, and I'm trying to find optimal solution, but as I understood we should manually implement some solution. For example, message buffer for messages which we send in one transaction. — Singularis24, May 12 '22 at 06:29

Kafka transactions and Producer buffering mechanism

1 Answers1