KStreams - KTable startup phase

Question

The KStreams - KTable join works in a very simple way: every time a new sample is emitted on the stream, a lookup by key is performed on the table.

Can this yield to unexpected behaviour in transient phases? We have a topology like so:

One KStream A where we perform a selectKey turning it into a Stream A1
One KStream B which we groupBy and then reduce, turning it into a KTable B1

At startup, we publish two records on A and two records on B, so that after the selectKey on A and the groupBy + reduce on B the key will match. However, we notice that sometimes the samples that the inner join between A1 and B1 fails, and we lose instead some output which we expect .

What is the right topology to ensure no updates get lost?

score 2 · Accepted Answer · edited Mar 20 '18 at 07:44

2

KStream-KTable join synchronization is best effort. We work on improvements to give better guarantees for 1.2 release. Atm, there is not much you can do.

If you need strict guarantees, you would need to implement your own stream-table join operator with a transform() instead of a join(). You can connect the KTable store to the Transformer and put custom logic in place for the join lookup.

edited Mar 20 '18 at 07:44

miguno

14,498
3
47
63

answered Mar 19 '18 at 02:55

Matthias J. Sax

59,682
7
117
137

And what Will I do inside the transformer? – Edmondo Mar 19 '18 at 04:07
I stream-table join is a key-value lookup. Thus for each input record of `tranform()` you do a `get()` on the KTable store. Or course, you need to put some logic into place for the case that the KTable was not updated yet to maybe retry later. – Matthias J. Sax Mar 19 '18 at 04:36
How would you restructure this retry logic in the context of stream processing? – Edmondo Mar 19 '18 at 08:13
That's completely up to you... It depends on the use case I would say. (That's why a KStream-KTable join is only best effort in the first place atm: the "kstream processor" does a key-value lookup but not retry if there is no match---ie, if KTable update is delayed, you miss the update). – Matthias J. Sax Mar 19 '18 at 16:14
Hello @MatthiasJ.Sax, thanks for the reploy! Is this still the same in 2.3? More specifically between ktable and ktable inner joins? I'm seeing the same behavior and before I go to the transform approach I'd like to know if I'm on the wrong direction, but the experienced effect is the same! – Renato Mefi Aug 26 '19 at 11:15
2.3 behaves the same for KTable-KTable joins as older versions. The only thing that changed is discussed here: https://stackoverflow.com/questions/57498201/time-semantics-between-kstream-and-ktable/57600470 -- It that's still not good enough for you, using a custom `transform` is the way to go. – Matthias J. Sax Aug 26 '19 at 18:32

KStreams - KTable startup phase

1 Answers1