1

We currently have 2 Kafka stream topics that have records coming in continuously. We're looking into joining the 2 streams based on a key after waiting for a window of 5 minutes but with my current code, I see records being emitted immediately without "waiting" to see if a matching record arrives in the other stream. My current implementation:

KStream<String, String> streamA =
    builder.stream(topicA, Consumed.with(Serdes.String(), Serdes.String()))
        .peek((key, value) -> System.out.println("Stream A incoming record key " + key + " value " + value));

KStream<String, String> streamB =
    builder.stream(topicB, Consumed.with(Serdes.String(), Serdes.String()))
        .peek((key, value) -> System.out.println("Stream B incoming record key " + key + " value " + value));


ValueJoiner<String, String, String > recordJoiner =
    (recordA, recordB) -> {
      if(recordA != null) {
        return recordA;
      } else {
        return recordB;
      }
    };

KStream<String, String > combinedStream =
    streamA(
        streamB,
        recordJoiner,
        JoinWindows
            .of(Duration.ofMinutes(5)),
        StreamJoined.with(
            Serdes.String(),
            Serdes.String(),
            Serdes.String()))
        .peek((key, value) -> System.out.println("Stream-Stream Join record key " + key + " value " + value));

combinedStream.to("test-topic"
    Produced.with(
        Serdes.String(),
        Serdes.String()));

KafkaStreams kafkaStreams = new KafkaStreams(builder.build(), streamsConfiguration);
kafkaStreams.start();

Although I have the JoinWindows.of(Duration.ofMinutes(5)), I see some records being emitted immediately. How do I ensure they are not?

Additionally, is this the most efficient way of joining 2 Kafka streams or is it better to come up with our own consumer implementation that reads from 2 streams etc?

user10751899
  • 147
  • 1
  • 9

0 Answers0