0

I'm trying to do an exponentially decaying moving average over a hopping window in Flink SQL. I need the have access to one of the borders of the window, the HOP_START in the following:

    SELECT                                                                              
      lb_index one_key,
    -- I have access to this one:
      HOP_START(proctime, INTERVAL '0.05' SECOND, INTERVAL '5' SECOND) start_time,
    -- Aggregation primitive:
      SUM(
        Y * EXP(TIMESTAMPDIFF(
          SECOND, 
          proctime, 
    -- This one throws:
          HOP_START(proctime, INTERVAL '0.05' SECOND, INTERVAL '5' SECOND)
      )))
    FROM write_position                                                                
    GROUP BY lb_index, HOP(proctime, INTERVAL '0.05' SECOND, INTERVAL '5' SECOND)

I'm getting the following stack trace:

11:55:37.011 [main] DEBUG o.a.c.p.RelOptPlanner - For final plan, using Aggregate(groupBy: (lb_index), window: (SlidingGroupWindow('w$, 'proctime, 5000.millis, 50.millis)), select: (lb_index, SUM($f2) AS Y, start('w$) AS w$start, end('w$) AS w$end, proctime('w$) AS w$proctime))
11:55:37.011 [main] DEBUG o.a.c.p.RelOptPlanner - For final plan, using Calc(select: (lb_index, proctime, *(payload.Y, EXP(/(CAST(/INT(Reinterpret(-(HOP_START(PROCTIME(proctime), 50, 5000), PROCTIME(proctime))), 1000)), 1000))) AS $f2))
11:55:37.011 [main] DEBUG o.a.c.p.RelOptPlanner - For final plan, using rel#459:DataStreamScan.DATASTREAM.true.Acc(table=[_DataStreamTable_0])
Exception in thread "main" org.apache.flink.table.codegen.CodeGenException: Unsupported call: HOP_START 
If you think this function should be supported, you can create an issue and start a discussion for it.
    at org.apache.flink.table.codegen.CodeGenerator$$anonfun$visitCall$3.apply(CodeGenerator.scala:1027)
    at org.apache.flink.table.codegen.CodeGenerator$$anonfun$visitCall$3.apply(CodeGenerator.scala:1027)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.flink.table.codegen.CodeGenerator.visitCall(CodeGenerator.scala:1027)
    at org.apache.flink.table.codegen.CodeGenerator.visitCall(CodeGenerator.scala:66)

It does say is it unimplemented while it works outside the aggregating SUM. So that's what makes me think this is a scoping issue.

Now, the thing is: I could transform this expression and do a final processing outside the aggregation, as exp(x+y) = exp(x)*exp(y); But I'm stuck with using TIMESTAMPDIFF (which did wonders in my previous issue). I have not found a way to cast TIME ATTRIBUTEs to NUMERIC types; also, I'm not comfortable exponentiating UNIX timestamps, even if I scale them down.

Anyway, this work-around would be sort of clunky and there might me another way. I don't know how I could massage scopes in this SQL piece to still 'be' in the window scope and have the start time without throwing.

Oliv
  • 10,221
  • 3
  • 55
  • 76
BenoitParis
  • 3,166
  • 4
  • 29
  • 56

1 Answers1

0

I suggest you experiment with HOP_PROCTIME() rather than HOP_START(). The differences are explained here, but the effect will be that you'll have a proctime attribute rather than a timestamp, which I'm hoping will make TIMESTAMPDIFF happy.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • TIMESTAMPDIFF is happy, but HOP_PROCTIME(proctime, ...) seems to provide me with exactly what is in proctime: MAX(TIMESTAMPDIFF(SECOND, HOP_PROCTIME(proctime, ...), proctime)) is always 0 on that 5 seconds window. Same for AVG and MIN. I really need to have access to one border of the window inside it to calculate the difference, before I can aggregate the thing. Tried HOP_END as well but it did not work. – BenoitParis Feb 16 '19 at 12:44
  • The problem is that proctime always returns the time at the moment you ask for it. You need to switch to either event time or ingestion time, so that there will be some difference between each event's timestamp and the time at the window boundary. – David Anderson Feb 16 '19 at 12:50