1

We have theses POJOs:

@Data
@NoArgsConstructure
@AllArgsConstructure
class MyPost {
    private String content;
    private SeenInfo seenInfo;
}


@Data
@NoArgsConstructure
@AllArgsConstructure
class SeenInfo {
    private Integer seenCount;
    //other fields...
}

and this left-join process in our application:

@Bean
public Function<KStream<String, MyPost>, Function<KStream<String, SeenInfo>, KStream<String, MyPost>>> joinProcess(Map<String, String> schemaConfig) {
    return postStream ->
            seenInfoStream -> {
                SpecificAvroSerde<MyPost> postSerde = new SpecificAvroSerde<>();
                SpecificAvroSerde<SeenInfo> seenInfoSerde = new SpecificAvroSerde<>();
                postSerde.configure(schemaConfig, true);
                seenInfoSerde.configure(schemaConfig, true);
                return postStream.leftJoin(seenInfoStream,
                        (p, s) -> {
                            p.setSeenInfo(s);
                            return p;
                        },
                        JoinWindows.of(Duration.ofMinutes(5)),
                        StreamJoined.with(Serdes.String(),
                                postSerde,
                                seenInfoSerde));
            };
}

Problem One:

When MyPost and SeenInfo matching values are present within 5 minutes, the join process produces two messages:

Message1: MyPost={ "content": "some text", "seenInfo": null}
Message2: MyPost={ "content": "some text", "seenInfo": { "seenCount": 1, ...}}

Problem Two:

If MyPost is present and SeenInfo is not, the join process will not return any data.
We Expect: Message: MyPost={ "content": "some text", "seenInfo": null}

What should we do to solve this problem?

1 Answers1

0

Seems you are using an older version of Kafka Streams, that faces the first issue about "spurious left join results". It's fixed in Kafka 3.1 release (cf https://issues.apache.org/jira/browse/KAFKA-10847).

Compare https://cwiki.apache.org/confluence/display/KAFKA/KIP-633%3A+Deprecate+24-hour+Default+Grace+Period+for+Windowed+Operations+in+Streams

For the second issue, yes, you should get an output -- if you use version 3.1, the only explanation for a missing output would be that your input streams "stopped/pause" and no new data arrives -- if there is no new data that join window won't close the the left-join result is "stuck" until time advances (based on record timestamps and thus required new input data).

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137