2

I have following use case.

There is one machine which is sending event streams to Kafka which are being received by CEP engine where warnings are generated when conditions are satisfied on the Stream data.

FlinkKafkaConsumer011<Event> kafkaSource = new FlinkKafkaConsumer011<Event>(kafkaInputTopic, new EventDeserializationSchema(), properties);
DataStream<Event> eventStream = env.addSource(kafkaSource);

Event POJO contains id, name, time, ip.

Machine will send huge data to Kafka and there are 35 unique event names from machine (like name1, name2 ..... name35) and I want to detect patterns for each event name combination (like name1 co-occurred with name2, name1 co-occurred with name3.. etc). I got totally 1225 combinations.

Rule POJO contains e1Name and e2Name.

List<Rule> ruleList -> It contains 1225 rules.

for (Rule rule : ruleList) {
    Pattern<Event, ?> warningPattern = Pattern.<Event>begin("start").where(new SimpleCondition<Event>() {

    @Override
        public boolean filter(Event value) throws Exception {
            if(value.getName().equals(rule.getE1Name())) {
                return true;
            }
            return false;
        }

    }).followedBy("next").where(new SimpleCondition<Event>() {
        @Override
        public boolean filter(Event value) throws Exception {
            if(value.getName().equals(rule.getE2Name())) {
                return true;
            }
            return false;
        }
    }).within(Time.seconds(30));
    PatternStream patternStream = CEP.pattern(eventStream, warningPattern);
}

Is this correct way to execute multiple patterns on one stream data or is there any optimized way to achieve this. With above approach we are getting PartitionNotFoundException and UnknownTaskExecutorException and memory issues.

  • You have defined patterns for all the possible combinations. What are you trying to achieve? Is it consecutive events that occured within 30 sec time window? – alili May 17 '19 at 13:18
  • Here my goal is identifying which combination is co-occurred max times with in 30 seconds – vikram raju May 20 '19 at 05:54

1 Answers1

0

IMO you don't need patterns to achieve your goal. You can define a stateful map function to the source, which maps event names as pairs (latest two names). After that, window the source to 30 seconds and apply the simple WordCount example to the source.

Stateful map function can be something like this (accepting only event name, you need to change it according to your input -extract event name etc.):

public class TupleMap implements MapFunction<String, Tuple2<String, Integer>>{
    Tuple2<String, String> latestTuple = new Tuple2<String, String>();

    public Tuple2<String, Integer> map(String value) throws Exception {
        this.latestTuple.f0 = this.latestTuple.f1;
        this.latestTuple.f1 = value;
        return new Tuple2<String, Integer>(this.latestTuple.f0 + this.latestTuple.f1, 1);
    }
}

and result with event name pairs and occurrence count as a tuple can be obtained like this (written to a kafka sink maybe?):

DataStream<Tuple2<String, Integer>> source = stream.map(new TupleMap());
SingleOutputStreamOperator<Tuple2<String, Integer>> sum = source.keyBy(0).timeWindow(Time.seconds(30)).sum(1);
alili
  • 166
  • 9