10

I am creating KTable processing data from KStream. But when I trigger a tombstone messages with key and null payload, it is not removing message from KTable.

sample -

public KStream<String, GenericRecord> processRecord(@Input(Channel.TEST) KStream<GenericRecord, GenericRecord> testStream,
KTable<String, GenericRecord> table = testStream
                .map((genericRecord, genericRecord2) -> KeyValue.pair(genericRecord.get("field1") + "", genericRecord2))
                .groupByKey()
                reduce((genericRecord, v1) -> v1, Materialized.as("test-store"));


GenericRecord genericRecord = new GenericData.Record(getAvroSchema(keySchema));
genericRecord.put("field1", Long.parseLong(test.getField1()));
ProducerRecord record = new ProducerRecord(Channel.TEST, genericRecord, null);
kafkaTemplate.send(record);

Upon triggering a message with null value, I can debug in testStream map function with null payload, but it doesn't remove record on KTable change log "test-store". Looks like it doesn't even reach reduce method, not sure what I am missing here.

Appreciate any help on this!

Thanks.

miguno
  • 14,498
  • 3
  • 47
  • 63
R K
  • 382
  • 5
  • 25

2 Answers2

7

As documented in the JavaDocs of reduce()

Records with {@code null} key or value are ignored.

Because, the <key,null> record is dropped and thus (genericRecord, v1) -> v1 is never executed, no tombstone is written to the store or changelog topic.

For the use case you have in mind, you need to use a surrogate value that indicates "delete", for example a boolean flag within your Avro record. Your reduce function needs to check for the flag and return null if the flag is set; otherwise, it must process the record regularly.

Update:

Apache Kafka 2.6 adds the KStream#toTable() operator (via KIP-523) that allows to transform a KStream into a KTable.

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
0

An addition to the above answer by Matthias:

Reduce ignores the first record on the stream, so the mapped and grouped value will be stored as-is in the KTable, never passing through the reduce method for tombstoning. This means that it will not be possible to just join another stream on that table, the value itself also needs to be evaluated.

I hope KIP-523 solves this.