3

I have a stream of documents, that go through multiple processing steps. These steps are done in parallel. After each step completes, a message is sent to stage completion topic. After all the steps are done, the tracker sends a message to processing complete topic with the document Id.

I am using kafka streams (with spring cloud stream on top) in the tracker implement the above functionality.

Following is the sample code.

   @StreamListener
    @SendTo("processingComplete")
    public KStream<String, String> onCompletion(@Input("stageCompletion")     
    KStream<String, String> stageCompletionStream) {
            return stageCompletionStream
                    .filter(this::checkValidity)
                    .groupByKey(Serialized.with(Serdes.String(), Serdes.String()))
                    .reduce(this::aggregateStageCompletion,     
    Materialized.as("stage_completion_store"))
                .toStream()
                .filter((ignored, message) -> checkCompletion(message))
                .map(this::publishCompletion);
    }

After I publish completion message, I need to clean up the state store - stage_completion_store (which happens to be rocks db by default) of that document Id.

The suggested approach is to insert a tombstone message; to do so I have additionally implemented another stream to read processing complete topic and merge the same with stage completion stream.

Follow is the code using this approach.

  @StreamListener
    @SendTo("processingComplete")
    public KStream<String, String> onCompletion(@Input("stageCompletion") 
    KStream<String, String> 
    stageCompletionStream,@Input("processingCompleteFeed") KStream<String, 
    String> processingCompletionStream){
        return processingCompletionStream.merge(stageCompletionStream)
        .filter(this::checkValidity)
        .groupByKey(Serialized.with(Serdes.String(),Serdes.String()))
        .reduce(this::aggregateStageCompletion,
         Materialized.as("stage_completion_store"))
        .toStream()
        .filter((ignored,message)->checkCompletion(message))
        .map(this::publishCompletion);
    }

The aggregateStageCompletion inserts the tombstone(returns null) when the message is a processing completion message.

Is this a good way to do it - read a stream just to mark tombstone? or is there a better approach to achieve the same?

Srikanth
  • 1,015
  • 12
  • 16
  • 2
    The approach looks ok to me, although not sure if there is a more idiomatic Kafka streams way to address this use-case. If this approach doesn't work from a spring angle, please let us know and will be happy to help you further. Here is a similar question. https://stackoverflow.com/questions/50708252/tombstone-messages-not-removing-record-from-ktable-state-store – sobychacko Aug 20 '18 at 22:02

0 Answers0