Fraud Detection DataStream API tutorial questions

Question

I am following the tutorial here.

Q1: Why in the final application do we clear all states and delete timer whenever flagState = true regardless of the current transaction amount? I refer to this part of the code:

// Check if the flag is set
if (lastTransactionWasSmall != null) {
    if (transaction.getAmount() > LARGE_AMOUNT) {
        //Output an alert downstream
        Alert alert = new Alert();
        alert.setId(transaction.getAccountId());

        collector.collect(alert);
    }
    // Clean up our state [WHY HERE?]
    cleanUp(context);
}

If the datastream for a transaction was 0.5, 10, 600, then flagState would be set for 0.5 then cleared for 10. So for 600, we skip the code block above and don't check for large amount. But if 0.5 and 600 transactions occurred within a minute, we should have sent an alert but we didn't.

Q2: Why do we use processing time to determine whether two transactions are 1 minute apart? The transaction class has a timeStamp field so isn't it better to use event time? Since processing time will be affected by the speed of the application, so two transactions with event times within 1 minute of each other could be processed > 1 minute apart due to lag.

score 0 · Accepted Answer · answered Jan 06 '21 at 09:33

A1: The fraud model being used in this example is explained by this figure:

In your example, the transaction 600 must immediately follow the transaction for 0.5 to be considered fraud. Because of the intervening transaction for 10, it is not fraud, even if all three transactions occur within a minute. It's just a matter of how the use case was framed.

A2: Doing this with event time would be a very valid choice, but would make the example much more complex. Not only would watermarks be required, but we would also have to sort the stream by event time, since a realistic example would have to consider that the events might be out-of-order.

At that point, implementing this with a process function would no longer be the best choice. Using the temporal pattern matching capabilities of either Flink's CEP library or Flink SQL with MATCH_RECOGNIZE would be the way to go.

A1: Fraud Detector v2 says: "suppose you wanted to set a 1 minute timeout to your fraud detector". The immediately after requirement is for v1. — Tessa, Jan 06 '21 at 23:58
OH that makes sense! I thought V2 was replacing the V1 requirement. Thanks for your help. — Tessa, Jan 08 '21 at 06:44

Fraud Detection DataStream API tutorial questions

1 Answers1