2

I am trying to publish in a transaction a message on 16 Kafka partitions on 7 brokers.

The flow is like this:

  1. open transaction
  2. write a message to 16 partitions
  3. commit transaction
  4. sleep 25 ms
  5. repeat

Sometimes the transaction takes over 1 second to complete, with an average of 50 ms. After enabling trace logging on producer's side, I noticed the following error:

TRACE internals.TransactionManager [kafka-producer-network-thread | producer-1] - [Producer clientId=producer-1, transactionalId=cma-2] 
Received transactional response AddPartitionsToTxnResponse(errors={modelapp-ecb-0=CONCURRENT_TRANSACTIONS, modelapp-ecb-9=CONCURRENT_TRANSACTIONS, modelapp-ecb-10=CONCURRENT_TRANSACTIONS, modelapp-ecb-11=CONCURRENT_TRANSACTIONS, modelapp-ecb-12=CONCURRENT_TRANSACTIONS, modelapp-ecb-13=CONCURRENT_TRANSACTIONS, modelapp-ecb-14=CONCURRENT_TRANSACTIONS, modelapp-ecb-15=CONCURRENT_TRANSACTIONS, modelapp-ecb-1=CONCURRENT_TRANSACTIONS, modelapp-ecb-2=CONCURRENT_TRANSACTIONS, modelapp-ecb-3=CONCURRENT_TRANSACTIONS, modelapp-ecb-4=CONCURRENT_TRANSACTIONS, modelapp-ecb-5=CONCURRENT_TRANSACTIONS, modelapp-ecb-6=CONCURRENT_TRANSACTIONS, modelapp-ecb-=CONCURRENT_TRANSACTIONS, modelapp-ecb-8=CONCURRENT_TRANSACTIONS}, throttleTimeMs=0) 
for request (type=AddPartitionsToTxnRequest, transactionalId=cma-2, producerId=59003, producerEpoch=0, partitions=[modelapp-ecb-0, modelapp-ecb-9, modelapp-ecb-10, modelapp-ecb-11, modelapp-ecb-12, modelapp-ecb-13, modelapp-ecb-14, modelapp-ecb-15, modelapp-ecb-1, modelapp-ecb-2, modelapp-ecb-3, modelapp-ecb-4, modelapp-ecb-5, modelapp-ecb-6, modelapp-ecb-7, modelapp-ecb-8])

The Kafka producer retries sending AddPartitionsToTxnRequest(s) several times until it succeeds, but this leads to delays.

The code looks like this:

Properties producerProperties = PropertiesUtil.readPropertyFile(_producerPropertiesFile);
_producer = new KafkaProducer<>(producerProperties);
_producer.initTransactions();

_producerService = Executors.newSingleThreadExecutor(new NamedThreadFactory(getClass().getSimpleName()));
_producerService.submit(() -> {
    while (!Thread.currentThread().isInterrupted()) {

        try {
            _producer.beginTransaction();
            for (int partition = 0; partition < _numberOfPartitions; partition++) 
                _producer.send(new ProducerRecord<>(_producerTopic, partition, KafkaRecordKeyFormatter.formatControlMessageKey(_messageNumber, token), EMPTY_BYTE_ARRAY));

            _producer.commitTransaction();
            _messageNumber++;
            Thread.sleep(_timeBetweenProducedMessagesInMillis);
        } catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException | UnsupportedVersionException e) {
            closeProducer();
            break;
        } catch (KafkaException e) {
            _producer.abortTransaction();
        } catch (InterruptedException e) {...} 
    }
});

Looking to broker's code, it seems there are 2 cases when this error is thrown, but I cannot tell why I get there

object TransactionCoordinator {
...
    def handleAddPartitionsToTransaction(...): Unit = {
    ...
        if (txnMetadata.pendingTransitionInProgress) {
            // return a retriable exception to let the client backoff and retry
            Left(Errors.CONCURRENT_TRANSACTIONS)
        } else if (txnMetadata.state == PrepareCommit || txnMetadata.state == PrepareAbort) {
            Left(Errors.CONCURRENT_TRANSACTIONS)
        }
    ...
    }
...
}

Thanks in advance for help!

Later edit:

Enabling trace logging on broker we were able to see that broker sends to the producer END_TXN response before transaction reaches state CompleteCommit. The producer is able to start a new transaction, which is rejected by the broker while it is still in the transition PrepareCommit -> CompleteCommit.

edxvshacks
  • 21
  • 3

0 Answers0