1

I've created a Ktable for a topic using streamsBuilder.table("myTopic"), which I materialize to a state store so that I can use interactive queries.

Every hour, I want to remove records from this state store (and associated changelog topic) whose values haven't been updated in the past hour.

I believe this may be possible using a punctuator, but I've only used the DSL so far, and so am not sure of how to proceed. I would be very grateful if somebody could provide me with an example.

Thanks,

Jack

1 Answers1

4

It is possible to mix and match the Processor API with the DSL, but you can't process a KTable. You would need to convert to a KStream. Alternatively you could create a new topology with a Processor that interacts with the state store.

You will need to store that state somewhere - how to determine if records are older than one hour. One option could be to add a timestamp to each record in the state store.

In the init method of a Processor you could call schedule (punctuate) to iterate records in the state store and remove old ones:

context.schedule(Duration.ofMillis(everyHourInMillis), PunctuationType.WALL_CLOCK_TIME, timestamp -> {
    myStateStore.all().forEachRemaining(keyValue -> {
        if (Instant.ofEpochMilli(valueInStateStore).compareTo(olderThanAnHour) < 0) {
            myStateStore.delete(keyValue.key);
        }
    });
});
Nic Pegg
  • 485
  • 3
  • 7
  • 2
    Thank you for your answer. It's very helpful. Must every processor process a stream? I ask because I just want to periodically delete old records from a state store, and don't want to process any particular stream. But all the examples I've seen with punctuators also process a stream via the `process` method. –  Jul 21 '20 at 08:19
  • I haven't tried it in any projects yet, but you can do the following streamsBuilder.build().addProcessor(...) - if you set the source to the KTable, this may be what you're looking for. It might not work, as usually the source of a Processor is a topic. – Nic Pegg Jul 21 '20 at 16:45
  • 1
    Yes, every `Processor` requires an input stream. Given that you only to `builder.table()` you would also do `builder.stream().process()` and add a KeyValueStore manually to the `process` operator: on each input record you just do a `store.put()` to maintain the state (a `KTable` does not anything else either...). Additionally, you can register a punctuation on the `Processor` to get easy access to the state store. -- Btw: Newer version of Kafka Streams ship with a KeyValueAndTimestampStore to you don't need to build anything custom to store a timestamp next to the value. – Matthias J. Sax Aug 05 '20 at 05:42