1

We have a requirement where we are consuming messages from one topic then there is some enrichment happening and then we are publishing the message to another topic. below are the events

  1. Consumer - Consume the message
  2. Enrichment - Enriched the consumed message
  3. Producer - Published Enriched message to other topic

I am using Spring cloud kafka binder and things are working fine. Recently we introduced idempotent producer and included transactionIdPrefix property and we observed that outbound channel is started sending 2 messages in the topic as it should have sent one message only. one message with actual json value another message with value as 'b'\x00\x00\x00\x00\x06&' characters. Below is the code and config. If I remove transactionIdPrefix then I could see one message only sent to outbound topic.

@StreamListener("INPUT")
@SendTo("OUTPUT")
public void consumer(Message message){
Acknowledgement ack = messge.getHeaders().get(KafkaHeaders.ACKNOWLEDGEMENT,Acknowledgement.class))
try{
    String inputMessage = message.getPayload.toString();
    String enrichMessage = // Enrichment on inputMessage
    ack.acknowledgement()   
    return enrichMessage;
}catch( Exception exp){
    ack.acknowledgement();
    throw exp;  
}
}

Configs are

  1. spring.cloud.stream.kafka.binder.transaction.transactionIdPrefix=TX-
  2. spring.cloud.stream.kafka.binder.transaction.producer.configuration.ack=all
  3. spring.cloud.stream.kafka.binder.transaction.producer.configuration.retries=10
  4. spring.cloud.stream.kafka.bindings.input.consumer.autoCommitOffset=false
  5. spring.cloud.stream.kafka.bindings.input.consumer.enableDlq=true
  6. spring.cloud.stream.kafka.bindings.input.consumer.dlqName=error.topic
  7. spring.cloud.stream.kafka.bindings.input.consumer.autoCommitOnError=true
  8. spring.cloud.stream.kafka.bindings.input.consumer.maxAttempt=3
  9. spring.cloud.stream.kafka.binder.transaction.producer.configuration.enable.idempotence=true

Messages that sent to the outbound topic are below.

  1. ConsumerRecord(topic = test, partition = 1, offset = 158, CreateTime = 1574297903694, timestamp= 1238776543, time_stamp_type=0, key=None value=b'{"name":"abc","age":"20"}',checksum=None,serialized_key_size=-1,serialized_value_size=159)

  2. ConsumerRecord(topic = test, partition = 1, offset = 158, CreateTime = 1574297903694, timestamp= 1238776543, time_stamp_type=0, key=b'\x00\x00\x00\x01' value=b'\x00\x00\x00\x00\x06&',checksum=None,serialized_key_size=-1,serialized_value_size=159)

Even in Dlq Topic message goes twice.

Appreciate if anybody can provide any pointers on this issue we are facing.

Cheers

Sach
  • 41
  • 4
  • Here is a sample that works with transactions: https://github.com/spring-cloud/spring-cloud-stream-samples/tree/main/transaction-kafka-samples. – sobychacko Oct 01 '21 at 20:25
  • You may want to upgrade your code to the latest functional model in Spring Cloud Stream, although I doubt that is the issue here. Are you using the latest release? – sobychacko Oct 01 '21 at 20:26
  • I am using version 3.0.0.RELEASE of spring-cloud-stream and spring-cloud-starter-stream-kafka project. Although I have tried using 3.1.3 version but I was still getting the same issue. – Sach Oct 02 '21 at 05:53

1 Answers1

0

I believe your code is working fine. Transactional producers technically do send messages twice - the uncommitted transactional records and then those very same records but marked as a complete/committed transaction. In other words, you should check to see if you configured your consumers (the ones consuming from the transactional topic in app) isolation.level to read_committed.

consumeRecords -> producer initiates the transaction -> enrichments/processing -> producer send message (UNCOMMITTED) -> finish processing the rest of the records in last .poll() batch -> commit / abort transaction (COMMITTED) -> repeat

Nerm
  • 170
  • 2
  • 11
  • My Consumer is read_committed so it is reading one message only. What is a reason to have uncommited transactional record on topic? Is there anyway to ignore that message to push on topic? – Sach Oct 03 '21 at 07:03
  • If it's reading one message only then your problem is solved correct? The only way to ignore uncommitted records is by using that consumer specific config property. The reason has to do with the design of accomplishing exactly once, which introduces a transactional log (internal topic called __transaction_state) and a designated transaction coordinator for a transactional producer. This log works together with the original log partition to accomplish transactions. – Nerm Oct 03 '21 at 16:49
  • The real issue is that you shouldn't be manually acknowledging offsets. To enable exactly once guarantees for a Kafka Streams application, simply set `processing.guarantee` configuration to either `exactly_once` or `exactly_once_beta`. Thats it. You can remove the manual acknowledgment. Forgive me for not including in the answer – Nerm Oct 03 '21 at 16:54
  • I am using Spring cloud kafka binder. Can I use processing.gurantee property in kafka binder? When you said "The only way to ignore uncommitted records is by using that consumer specific config property" are you talking about the read_commited property here? Manual acknowledgement I purposely added because when the message was going into DLQ topic due to some error, somehow offset was not getting committed and due that when I restart the instance same message was getting consuming again which was sent into DLQ topic earlier. – Sach Oct 04 '21 at 08:16
  • 1
    When using transactions, the acknowledgment causes the listener container to send the offset to the transaction, according to exactly once semantics; see https://docs.spring.io/spring-kafka/docs/current/reference/html/#exactly-once. The consumer should never see the "internal" log entry for the commit/rollback indication. – Gary Russell Oct 04 '21 at 13:40