0

Need some opinion/help around one use case of KStream/KTable usage.

Scenario:

I have 2 topics with common key--requestId.

  1. input_time(requestId,StartTime)
  2. completion_time(requestId,EndTime)

The data in input_time is populated at time t1 and the data in completion_time is populated at t+n.(n being the time taken for a process to complete).

Objective To compare the time taken for a request by joining data from the topics and raised alert in case of breach of a threshold time.

It may happen that the process may fail and the data may not arrive on the completion_time topic at all for the request. In that case we intend to use a check that if the currentTime is well past a specific(lets say 5s) threshold since the start time.

  1. input_time(req1,100) completion_time(req1,104) --> no alert to be raised as 104-100 < 5(configured value)
  2. input_time(req2,100) completion_time(req2,108) --> alert to be raised with req2,108 as 108-100 >5
  3. input_time(req3,100) completion_time no record--> if current Time is beyond 105 raise an alert with req3,currentSysTime as currentSysTime - 100 > 5

Options Tried. 1) Tried both KTable-KTable and KStream-Kstream outer joins but the third case always fails.

    final KTable<String,Long> startTimeTable =   builder.table("input_time",Consumed.with(Serdes.String(),Serdes.Long()));
    final KTable<String,Long> completionTimeTable = builder.table("completion_time",Consumed.with(Serdes.String(),Serdes.Long()));     
    KTable<String,Long> thresholdBreached =startTimeTable .outerJoin(completionTimeTable,
            new MyValueJoiner());
    thresholdBreached.toStream().filter((k,v)->v!=null)
            .to("finalTopic",Produced.with(Serdes.String(),Serdes.Long()));

Joiner

 public Long apply(Long startTime,Long endTime){

        // if input record itself is not available then we cant use any alerting.
        if (null==startTime){
            log.info("AlertValueJoiner check: the start time itself is null so returning null");
            return null;
        }
        // current processing time is the time used.
        long currentTime= System.currentTimeMillis();
        log.info("Checking startTime {} end time {} sysTime {}",startTime,endTime,currentTime);
        if(null==endTime && currentTime-startTime>5000){
            log.info("Alert:No corresponding record from file completion yet currentTime {} startTime {}"
                    ,currentTime,startTime);
            return currentTime-startTime;
        }else if(null !=endTime && endTime-startTime>5000){
            log.info("Alert: threshold breach for file completion startTime {} endTime {}"
                    ,startTime,endTime);
            return endTime-startTime;
        }
    return null;
    }

2) Tried the custom logic approach recommended as per the thread How to manage Kafka KStream to Kstream windowed join? -- This approach stopped working for scenarios 2 and 3.

Is there any case of handling all three scenarios using DSL or Processors?

Not sure of we can use some kind of punctuator to listen to when the window changes and check for the stream records in current window and if there is no matching records found,produce a result with systime.?

mandev
  • 23
  • 1
  • 4
  • You logic is quite special to your application. I would recommend to use the Processor API and build a custom operator. – Matthias J. Sax Jun 03 '20 at 01:24
  • Thanks Matthias.. I have implemented a bespoke logic using punctuator and some key value state stores. – mandev Jun 11 '20 at 06:46

1 Answers1

0

Due to the nature of the logic involve it surely had to be done with combination of DSL and processor API.

  1. Used a custom transformer and state store to compare with configured values.(case 1 &2)
  2. Added a punctuator based on wall clock for handling the 3rd case
mandev
  • 23
  • 1
  • 4