This question comes in mind as we are running kafka streams applications without EOS enabled due to infra constraints. We are unsure of its behavior when doing some custom logic using transformer/processor API with changeloged state stores .
Say we are using following topology to de-duplicate records before sending to downstream:
[topic] -> [flatTransformValues + state store] -> [...(downstream)]
the transformer here will compare incoming records against the state store and only forward + update the record when there's a value change, so for messages [A:1], [A:1], [A:2]
, we expect downstream will only get [A:1], [A:2]
Question is when failures happens, is it possible that [A:2]
get stored in the state store's changelog, while downstream does not receive the message, so that any retry reading [A:2]
will discard the record and its lost forever?
If not, please tell me if any mechanism prevent this happening, one way i think it could work is if kafka stream produce to changelog topics and commit offsets only after produce to downstream succeeds?
Much appreciated!