I am trying to publish in a transaction a message on 16 Kafka partitions on 7 brokers.
The flow is like this:
- open transaction
- write a message to 16 partitions
- commit transaction
- sleep 25 ms
- repeat
Sometimes the transaction takes over 1 second to complete, with an average of 50 ms. After enabling trace logging on producer's side, I noticed the following error:
TRACE internals.TransactionManager [kafka-producer-network-thread | producer-1] - [Producer clientId=producer-1, transactionalId=cma-2]
Received transactional response AddPartitionsToTxnResponse(errors={modelapp-ecb-0=CONCURRENT_TRANSACTIONS, modelapp-ecb-9=CONCURRENT_TRANSACTIONS, modelapp-ecb-10=CONCURRENT_TRANSACTIONS, modelapp-ecb-11=CONCURRENT_TRANSACTIONS, modelapp-ecb-12=CONCURRENT_TRANSACTIONS, modelapp-ecb-13=CONCURRENT_TRANSACTIONS, modelapp-ecb-14=CONCURRENT_TRANSACTIONS, modelapp-ecb-15=CONCURRENT_TRANSACTIONS, modelapp-ecb-1=CONCURRENT_TRANSACTIONS, modelapp-ecb-2=CONCURRENT_TRANSACTIONS, modelapp-ecb-3=CONCURRENT_TRANSACTIONS, modelapp-ecb-4=CONCURRENT_TRANSACTIONS, modelapp-ecb-5=CONCURRENT_TRANSACTIONS, modelapp-ecb-6=CONCURRENT_TRANSACTIONS, modelapp-ecb-=CONCURRENT_TRANSACTIONS, modelapp-ecb-8=CONCURRENT_TRANSACTIONS}, throttleTimeMs=0)
for request (type=AddPartitionsToTxnRequest, transactionalId=cma-2, producerId=59003, producerEpoch=0, partitions=[modelapp-ecb-0, modelapp-ecb-9, modelapp-ecb-10, modelapp-ecb-11, modelapp-ecb-12, modelapp-ecb-13, modelapp-ecb-14, modelapp-ecb-15, modelapp-ecb-1, modelapp-ecb-2, modelapp-ecb-3, modelapp-ecb-4, modelapp-ecb-5, modelapp-ecb-6, modelapp-ecb-7, modelapp-ecb-8])
The Kafka producer retries sending AddPartitionsToTxnRequest(s) several times until it succeeds, but this leads to delays.
The code looks like this:
Properties producerProperties = PropertiesUtil.readPropertyFile(_producerPropertiesFile);
_producer = new KafkaProducer<>(producerProperties);
_producer.initTransactions();
_producerService = Executors.newSingleThreadExecutor(new NamedThreadFactory(getClass().getSimpleName()));
_producerService.submit(() -> {
while (!Thread.currentThread().isInterrupted()) {
try {
_producer.beginTransaction();
for (int partition = 0; partition < _numberOfPartitions; partition++)
_producer.send(new ProducerRecord<>(_producerTopic, partition, KafkaRecordKeyFormatter.formatControlMessageKey(_messageNumber, token), EMPTY_BYTE_ARRAY));
_producer.commitTransaction();
_messageNumber++;
Thread.sleep(_timeBetweenProducedMessagesInMillis);
} catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException | UnsupportedVersionException e) {
closeProducer();
break;
} catch (KafkaException e) {
_producer.abortTransaction();
} catch (InterruptedException e) {...}
}
});
Looking to broker's code, it seems there are 2 cases when this error is thrown, but I cannot tell why I get there
object TransactionCoordinator {
...
def handleAddPartitionsToTransaction(...): Unit = {
...
if (txnMetadata.pendingTransitionInProgress) {
// return a retriable exception to let the client backoff and retry
Left(Errors.CONCURRENT_TRANSACTIONS)
} else if (txnMetadata.state == PrepareCommit || txnMetadata.state == PrepareAbort) {
Left(Errors.CONCURRENT_TRANSACTIONS)
}
...
}
...
}
Thanks in advance for help!
Later edit:
Enabling trace logging on broker we were able to see that broker sends to the producer END_TXN response before transaction reaches state CompleteCommit. The producer is able to start a new transaction, which is rejected by the broker while it is still in the transition PrepareCommit -> CompleteCommit.