That is absolutely possible. When you define you Stream operator, you specify the join window size explicitly.
KStream stream1 = ...;
KStream stream2 = ...;
long joinWindowSizeMs = 5L * 60L * 1000L; // 5 minutes
long windowRetentionTimeMs = 30L * 24L * 60L * 60L * 1000L; // 30 days
stream1.leftJoin(stream2,
... // add ValueJoiner
JoinWindows.of(joinWindowSizeMs)
);
// or if you want to use retention time
stream1.leftJoin(stream2,
... // add ValueJoiner
(JoinWindows)JoinWindows.of(joinWindowSizeMs)
.until(windowRetentionTimeMs)
);
See http://docs.confluent.io/current/streams/developer-guide.html#joining-streams for more details.
The sliding window basically defines an additional join predicate. In SQL-like syntax this would be something like:
SELECT * FROM stream1, stream2
WHERE
stream1.key = stream2.key
AND
stream1.ts - before <= stream2.ts
AND
stream2.ts <= stream1.ts + after
where before == after == joinWindowSizeMs
in this example. before
and after
can also have different values if you use JoinWindows#before()
and JoinWindows#after()
to set those values explicitly.
The retention time of source topics, is completely independent of the specified windowRetentionTimeMs
that is applied to an changelog topic created by Kafka Streams itself. Window retention allows to join out-of-order records with each other, i.e., record that arrive late (keep in mind, that Kafka has an offset based ordering guarantee, but with regard to timestamps, record can be out-of-order).