1

I have a KStream<String, X> which I essentially want to convert to a KTable<String, Y>

The only way I could find to achieve this using the DSL is with a map, group then reduce.

val stream: KStream<String, X> = ...
val table: KTable<String, Y> = stream
  .mapValues({ value -> toYOrNull(value)})
  .groupByKey(Grouped.with(Serdes.String(), ySerde))
  .reduce(
    {old: Y?, updated: Y? -> updated},
    Materialized.`as`<String, Y, KeyValueStore<Bytes, ByteArray>>("y-store")
      .withKeySerde(Serdes.String()
      .withValueSerde(ySerde)
  )

I would expect this to handle the case when the value of updated in the reduce is null however when I inspect the store using the TopologyTestDriver it still seems to have the old version. What am I doing wrong?

This is my test:

@Test
fun shouldDeleteFromTableWhenNull() {
  val store = testDriver.getKeyValueStore<String, Y?>("y-store")
  store.put("key", Y())

  inputTopic.pipeInput("key", anXThatMapsToANullY)

  assertThat(store.get("key")).isNull() // Fails as the old entry is still there
}
Eduardo
  • 6,900
  • 17
  • 77
  • 121

2 Answers2

1

Records with value null are ignored.

It is expected behaviour according to documentation: KGroupedStream::reduce(...) Java Doc

Combine the values of records in this stream by the grouped key. Records with null key or value are ignored

Bartosz Wardziński
  • 6,185
  • 1
  • 19
  • 30
1

In upcoming Apache Kafka 2.5 release a new operator KStream#toTable() is added to address this use case (cf. https://issues.apache.org/jira/browse/KAFKA-7658)

In older versions, you would need to use a non-null "surrogate delete value" to avoid that the record is dropped and let your reduce function return null if it sees the "surrogate delete value".

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • Thanks for the response, do you have any idea when that will be released? – Eduardo Mar 10 '20 at 15:53
  • 1
    The release plan can be found in the Apache Kafka wiki: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=143428858 -- devlopment is closed and voting on RCs started already. Hence, it should be release soon assuming no more blockers are found during testing. – Matthias J. Sax Mar 10 '20 at 16:39
  • @Eduardo in older version, you can also using transformValues to access and update store `y-store` directly by getting a `KeyValueStore` using `ProcessorContext` – Tuyen Luong Mar 18 '20 at 01:43
  • 1
    @TuyenLuong You can do that, but you don't get a `KTable` back -- thus it depends what other operations you want to perform -- for "interactive queries" your suggestion would work. However, if you want to process the data further, for example in a table-table or stream-table join, using `transformValues()` won't help. – Matthias J. Sax Mar 18 '20 at 04:38