0

While studying Flink CEP library over the last few days, I've been under the impression that It doesn't add any new fundamental functionality to Flink's standard capabilities. It seems like Flink CEP's only purpose is to make event processing easier, with clear semantics and intuitive code structure. As an example, Flink CEP presents only 5 semantics of event match skipping. Although these semantics may be enough for a great range of cases, it may not solve specific problems, which makes us return to plain Flink.

A test case is the following pattern :

Emmit a alert(represented by 'a') for each non-overlapping pair of numbers in a stream

Represented by the pattern:

Pattern.begin[EventType]("pair",skipStrategy).where(new AlwaysTrueFunction()).times(2)

So, for a input like (numbers entering from left to right on the stream) 1 1 1 1 1, the expected output would be a a, but none of the 5 match skipping strategies would give the right result:

No-skip: a a a a
Skip-to-next: a a a a
Skip-past-last-event: a a a a
Skip-to-first[1]: a a a a
Skip-to-last[1]: a a a a

Although these strategies can't generate the desired pattern, It could be easily made using a RichFunction with a ValueState counter to determine when a new alert should be emmited, transforming the input stream in a stream of events.

Thus, I would appreciate some light over these questions:

  • Why was CEP library created if Flink seems to be more complete?

  • A pattern made with CEP is more efficient(greater throughput/other metric) than one made with Flink standard DataStream operators?(if possible, with some links provided for articles/papers/documentation about this)

João Luca
  • 19
  • 1
  • 8

1 Answers1

1

and thanks for playing with Flink CEP.

Flink CEP is a library on top of Flink. As such, it does not add any functionality that cannot be implemented using vanilla Flink (ProcessFunctions, etc). In fact, under the hood it is implemented as a special operator who is checking elements that match a specific pattern and much of its functionality could probably be even implemented as a ProcessFunction (with a lot of tooling around).

That said, Flink CEP may not add functionality that cannot be implemented with vanilla Flink, BUT it adds expressivity which makes some usecases easier to implement. The same holds for other APIs as well, for example the Windowing API in Flink, which you can implement using ProcessFunctions (with a lot of tooling around).

Now when it comes to efficiency, the answer is that "it depends". Handcrafting a special-process function tailored to your usecase and with all optimizations possible for your workload can be more efficient than FlinkCEP, as the latter is a general purpose library. If you have the expertise and the time, then the optimal solution would always be to implement PoCs using both (CEP and vanilla Flink) and choose the most efficient for your case.

  • Thanks for the answer, that's exactly what I thought. I ended up finding [this](https://flink.apache.org/news/2016/04/06/cep-monitoring.html) today, which was in the same direction of my impressions about Flink CEP. – João Luca Mar 05 '20 at 18:02