0

In the example here: https://github.com/confluentinc/kafka-streams-examples/blob/5.1.0-post/src/main/java/io/confluent/examples/streams/PageViewRegionExample.java there's a KStream and KTable join.

And in the driver https://github.com/confluentinc/kafka-streams-examples/blob/5.1.0-post/src/main/java/io/confluent/examples/streams/PageViewRegionExampleDriver.java it sends users to a users topic, and page views to a page views topic (with the user inside the view).

However, in the example, we're first creating a KStream for the page views topic, then a KTable for the user profiles topic, and then joining them. Assuming the application doesn't just load every piece of data from both streams before executing the join, what happens if a view comes and the user profile hasn't been saved to the table yet?

J. Doe
  • 31
  • 1
  • 7

2 Answers2

1

If KTable will not be load before events in stream appear leftJoin will have null at KTable site, and join will not return joinedValue.

Suggestion is to start Kafka Streams Application, load data into topic, that is used by KTable (by some producer) and than start emitting event to stream topic.

Interesting presentation about join in Kafka Streams was in Kafka Summit (San Francisco 2018), video can be found: View Video and Slides Zen and the Art of Streaming Joins—The What, When and Why

Steephen
  • 14,645
  • 7
  • 40
  • 47
Bartosz Wardziński
  • 6,185
  • 1
  • 19
  • 30
  • 3
    That is not completely correct: KTable and KStream data is synchronized based on record timestamps. Thus, the order in which data is loaded into the table and when stream-side records are processed to do table lookups, depend on those timestamps. – Matthias J. Sax Jan 15 '19 at 03:40
0

If KTable doesn't have key & we use KStream leftJoin with KTable then it will emit the record with KTable data as null which you have to handle in your joiner otherwise you will get NullPointerException.

And again if you use inner join (join()) then nothing will happen & no joining data will be emitted & eventually you will loose the KStream record.

I have found a reference in CustomStreamTableJoinIntegrationTest.java https://github.com/confluentinc/kafka-streams-examples/blob/5.2.1-post/src/test/java/io/confluent/examples/streams/CustomStreamTableJoinIntegrationTest.java

Here we can use custom stream table join & can temporarily store KSteam record in a buffer stream table for processing the join later after we get the data coming to KTable.

But this doesn't seems like a preity approach. I have posted a question on stackoverflow and looking for someone to answer it - KStream join with KTable record drops if key not exist in KTable

Sharry India
  • 341
  • 1
  • 9