Need some opinion/help around one use case of KStream/KTable usage.
Scenario:
I have 2 topics with common key--requestId.
- input_time(requestId,StartTime)
- completion_time(requestId,EndTime)
The data in input_time is populated at time t1 and the data in completion_time is populated at t+n.(n being the time taken for a process to complete).
Objective To compare the time taken for a request by joining data from the topics and raised alert in case of breach of a threshold time.
It may happen that the process may fail and the data may not arrive on the completion_time topic at all for the request. In that case we intend to use a check that if the currentTime is well past a specific(lets say 5s) threshold since the start time.
- input_time(req1,100) completion_time(req1,104) --> no alert to be raised as 104-100 < 5(configured value)
- input_time(req2,100) completion_time(req2,108) --> alert to be raised with req2,108 as 108-100 >5
- input_time(req3,100) completion_time no record--> if current Time is beyond 105 raise an alert with req3,currentSysTime as currentSysTime - 100 > 5
Options Tried. 1) Tried both KTable-KTable and KStream-Kstream outer joins but the third case always fails.
final KTable<String,Long> startTimeTable = builder.table("input_time",Consumed.with(Serdes.String(),Serdes.Long()));
final KTable<String,Long> completionTimeTable = builder.table("completion_time",Consumed.with(Serdes.String(),Serdes.Long()));
KTable<String,Long> thresholdBreached =startTimeTable .outerJoin(completionTimeTable,
new MyValueJoiner());
thresholdBreached.toStream().filter((k,v)->v!=null)
.to("finalTopic",Produced.with(Serdes.String(),Serdes.Long()));
Joiner
public Long apply(Long startTime,Long endTime){
// if input record itself is not available then we cant use any alerting.
if (null==startTime){
log.info("AlertValueJoiner check: the start time itself is null so returning null");
return null;
}
// current processing time is the time used.
long currentTime= System.currentTimeMillis();
log.info("Checking startTime {} end time {} sysTime {}",startTime,endTime,currentTime);
if(null==endTime && currentTime-startTime>5000){
log.info("Alert:No corresponding record from file completion yet currentTime {} startTime {}"
,currentTime,startTime);
return currentTime-startTime;
}else if(null !=endTime && endTime-startTime>5000){
log.info("Alert: threshold breach for file completion startTime {} endTime {}"
,startTime,endTime);
return endTime-startTime;
}
return null;
}
2) Tried the custom logic approach recommended as per the thread How to manage Kafka KStream to Kstream windowed join? -- This approach stopped working for scenarios 2 and 3.
Is there any case of handling all three scenarios using DSL or Processors?
Not sure of we can use some kind of punctuator to listen to when the window changes and check for the stream records in current window and if there is no matching records found,produce a result with systime.?