I am trying to use CEP of Flink for log files (as batch job), but not for streams (as realtime). Is that possible ? If yes, do you know examples Scala codes about that ?
Asked
Active
Viewed 103 times
1 Answers
0
Flink's DataStream API and associated libraries, including the CEP library, can be used on bounded, historic (batch) datasets or with unbounded, live streams -- it makes no difference. Just setup a file (or directory) as the data source and use CEP normally. For correct, reproducible results, you should work in event time (assuming time plays a role in your processing). This is important, because CEP wants to sort your input stream(s) according to event time -- notions of before and after should be relative to when the events occured, not when they were processed.
A bit of googling will lead you to some CEP examples. There's a simple example (in Java and Scala) in the Flink training (github).

David Anderson
- 39,434
- 4
- 33
- 60