1

The use case is to flush records from partitions which did not receive new data in kafka streams as we are using suppress which requires stream time.

So We have a Window Store with tumbling window of 1 minute with reduce operation attached with suppress. The design of suppress is dependent on stream time. So if any partition does not receive new consumer record then suppress will not move ahead for the pending consumer record in that partition.

It is worth noting that retention period of state store is set to 65 seconds.

So, to do a explicit flush from window state store decided to go with tranform api and used in the DSL topology.

In transform node we are using context.schedule to schedule punctuator to get access to state store and run a windowed query i.e. fetchall(startTimeInstant,endTimeInstant) to get old keys which are still not flushed out.

It is worth noting from the documentation that retention period is the minimum amount of time the data will stay in window store. Only if all the records in the window are old enough then only it is flushed.

Now the idea is the successful records should not be there in state store when we run fetchall (as starttime is (utc-3minutes). But till 6 minutes old data which was flushed out is still there in the window store.

The PROBLEM here is i do not want to see old records in window store as then payload has to be seen/parsed to make a choice whether to flush the data or not which is performance intensive.

i also checked the changelog store topic compact/delete policy. It also has 65 seconds.

I know classic approach is to send keep alive packet on all the partitions of the input topic but that is not feasible in our case as input topic is used by multiple clients. They all will have to change.

Jay Ghiya
  • 424
  • 5
  • 16

0 Answers0