2

I want to materialize a KTable from KStream and I want the KeyValueStore to be sorted by the Key.

I tried looking up the KTable API Spec (https://kafka.apache.org/20/javadoc/org/apache/kafka/streams/kstream/KTable.html), but no 'sort'-method exists. I also looked up this article (https://dzone.com/articles/how-to-order-streamed-dataframes) that suggests implementing sorting via the Processor API. However, I am checking to see if this can be achieved some other way ?

fhussonnois
  • 1,607
  • 12
  • 23
Sanjay Das
  • 180
  • 3
  • 14

2 Answers2

3

KafkaStream allows you materialized queriable state stores. You can then get a read only access to a store by invoking the method kafkaStream#store().

If you define persistant store, KafkaStreams will use RocksDB to store your data. The returned KeyValueIterator instance will used a RocksDB iterator that will allow you to iterate over the keys-values in a sorted manner Rocks Iterator-Implementation.

Example :

    KafkaStreams streams = new KafkaStreams(topology, props);
    ReadOnlyKeyValueStore<Object, Object> store = streams.store("storeName", QueryableStoreTypes.keyValueStore());
    KeyValueIterator<Object, Object> iterator = store.all();
fhussonnois
  • 1,607
  • 12
  • 23
  • This returns all data **unsorted**, right? – OneCricketeer Mar 14 '23 at 19:33
  • I think this actually depends on the StateStore implementation. With RocksDB, the official documentation says : "RocksDB Iterator allows users to iterate over the DB forward and backward in a sorted manner." (https://github.com/facebook/rocksdb/wiki/Iterator-Implementation#rocksdb-iterator). But the KStreams API does not provide any guarantees about this. Also, since writing this answer it is possible that the Kafka Streams internals has also changed. – fhussonnois Mar 15 '23 at 09:09
  • Interesting, thanks. Though, the values in the table are sorted by what? In Kafka, the keys&values are just bytes, and therefore also when written to RocksDB, so it cannot be numeric or textually lexicographic... I think that piece has been documented somewhere in KStreams docs, in relation to prefixScan function - https://kafka.apache.org/34/javadoc/org/apache/kafka/streams/state/ReadOnlyKeyValueStore.html – OneCricketeer Mar 15 '23 at 20:49
2

Add the events to the StateStore with the key. The KeyValueIterator returned by the StateStore navigates KeyValue in ordered manner.

public class SortProcessor extends AbstractProcessor<String, Event> {

    private static Logger LOG = LoggerFactory.getLogger(SortProcessor.class);
    private final String stateStore;
    private final Long bufferIntervalInSeconds;

    // Why not use a simple Java NavigableMap? Check out my answer at : https://stackoverflow.com/a/62677079/2256618
    private KeyValueStore<String, Event> keyValueStore;

    public SortProcessor(String stateStore, Long bufferIntervalInSeconds) {
        this.stateStore = stateStore;
        this.bufferIntervalInSeconds = bufferIntervalInSeconds;
    }

    @Override
    public void init(ProcessorContext processorContext) {
        super.init(processorContext);
        keyValueStore = (KeyValueStore) context().getStateStore(stateStore);
        context().schedule(Duration.ofSeconds(bufferIntervalInSeconds), PunctuationType.WALL_CLOCK_TIME, this::punctuate);
    }

    void punctuate(long timestamp) {
        LOG.info("Punctuator invoked...");
        try (KeyValueIterator<String, Event> iterator = keyValueStore.all()) {
            while (iterator.hasNext()) {
                KeyValue<String, Event> next = iterator.next();
                if (next.value == null) {
                    continue;
                }
                LOG.info("Sending {}", next.key);
                context().forward(null, next.value);
                keyValueStore.delete(next.key);
            }
        }
    }

    @Override
    public void process(String key, Event value) {
        Event event = Event.builder(value).payload(value.getPayload().toUpperCase()).build();
        keyValueStore.put(event.getEventType().name() + " " + event.getId(), event);
    }

    public static String getName() {
        return "sort-processor";
    }
}

Executable code is here. I have used a simple in-memory state store here. If you anticipate huge number of events in short spurt, you can use persistent state store as already suggested in other answer.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245