We currently have 2 Kafka stream topics that have records coming in continuously. We're looking into joining the 2 streams based on a key after waiting for a window of 5 minutes but with my current code, I see records being emitted immediately without "waiting" to see if a matching record arrives in the other stream. My current implementation:
KStream<String, String> streamA =
builder.stream(topicA, Consumed.with(Serdes.String(), Serdes.String()))
.peek((key, value) -> System.out.println("Stream A incoming record key " + key + " value " + value));
KStream<String, String> streamB =
builder.stream(topicB, Consumed.with(Serdes.String(), Serdes.String()))
.peek((key, value) -> System.out.println("Stream B incoming record key " + key + " value " + value));
ValueJoiner<String, String, String > recordJoiner =
(recordA, recordB) -> {
if(recordA != null) {
return recordA;
} else {
return recordB;
}
};
KStream<String, String > combinedStream =
streamA(
streamB,
recordJoiner,
JoinWindows
.of(Duration.ofMinutes(5)),
StreamJoined.with(
Serdes.String(),
Serdes.String(),
Serdes.String()))
.peek((key, value) -> System.out.println("Stream-Stream Join record key " + key + " value " + value));
combinedStream.to("test-topic"
Produced.with(
Serdes.String(),
Serdes.String()));
KafkaStreams kafkaStreams = new KafkaStreams(builder.build(), streamsConfiguration);
kafkaStreams.start();
Although I have the JoinWindows.of(Duration.ofMinutes(5))
, I see some records being emitted immediately. How do I ensure they are not?
Additionally, is this the most efficient way of joining 2 Kafka streams or is it better to come up with our own consumer implementation that reads from 2 streams etc?