1

I'm programming a simple example for testing the new Scala API for CEP in Flink, using the latest Github version for 1.1-SNAPSHOT.

The Pattern is only a check for a value, and outputs a single String as a result for each pattern matched. Code is as follows:

val pattern : Pattern[(String, Long, Int), _] = Pattern.begin("start").where(_._3 < 4)

val cepEventAlert = CEP.pattern(streamingAlert, pattern)

def selectFn(pattern : mutable.Map[String, (String, Long, Int)]): String = {
    val startEvent = pattern.get("start").get
    "Alerta:"+startEvent._1+": Pattern"
}

val patternStreamSelected = cepEventAlert.select(selectFn(_))

patternStreamSelected.print()

It compiles and runs under 1.1-SNAPSHOT without issue, but the jobmanager output shows no sign of that print(). Even relaxing the pattern conditions, and setting only a "start" (Accepting all events) returns absolutely nothing.

Also, when trying to add stages, the code fails to compile. If I change the Pattern to (Finding two consecutive events with 3rd field less than 4):

Pattern.begin("start").where(_._3 < 4).next("end").where(_._3 < 4).within(Time.seconds(30))

The compiler then throws:

error: missing parameter type for expanded function ((x$4) => x$4._3.$less(4))

Showing the error is on the first where() after the "start" stage. I tryed to explicitly set the parameter type with:

(x: (String, Long, Int)) => x._3 < 4

That way it compiles again, but when it runs on Flink, then no output is shown. StreamingAlert is a Scala DataStream[(String, Long, Int)], and in other parts of code, I can filter with _._ < 4 without problems and the output seems correct.

midnight1247
  • 356
  • 4
  • 17

1 Answers1

1

The print() API call in the streaming API does not trigger eager execution. You still have to call env.execute() at the end of your program.

When you define your pattern you should provide the event type somewhere. Either you do it as you've done it or you do it via a type parameter for begin:

Pattern.begin[(String, Long, Int)]("start").where(_._3 < 4).next("end").where(_._3 < 4).within(Time.seconds(30))
Till Rohrmann
  • 13,148
  • 1
  • 25
  • 51
  • `env.execute()` is called, and I can read other DataStreams output, also the job is listed as running in the web interface, and the CEP task shows that is recieving data, but output is 0B. Later in the code, `streamingAlert` is processed with `DataStream.filter()` and all seems correct (Both in the web interface and in the output log). The only thing that fails to output any element is the PatternStream. If I were missing the `execute()` call I supose there would be no output at all. – midnight1247 May 25 '16 at 14:45
  • Have you checked the `.out` files of the task managers? The print statement is executed on the TaskManagers and, thus, the output is written in their stdout file and not in the stdout file of the JobManager (unless you've started a local cluster). – Till Rohrmann May 25 '16 at 16:45