0

Context:

There is a ds: DataStream[Event] , and there are 3 types of Event: A, B, C.

Question: How to monitor occurrences of different Events in different interval of time?

i.e. Suppose there is only one event in a second, and it is in order.

ds = A, B, C, A, C, B, A, C, A, B, C....

Then,

occurrences of A in each 3 seconds are: 1, 1, 1, 1, 1, 1, 2, 1, 1...

occurrences of B in each 2 seconds are: 1, 1, 0, 0, 1, 1, 0, 0, 1, 1...

occurrences of C in each 4 seconds are: 1, 2, 2, 1, 2, 1, 1, 2....

Leyla Lee
  • 466
  • 5
  • 19

2 Answers2

1

You didn't explicitly say it, but I'm assuming your number of unique event types that have to be tracked in different window times is arbitrarily large. I'm further assuming that you have or could easily create a unique String for each event type. In that case:

Use a MapFunction to convert your events into Tuple where the string is the event ID.

Use DataStream.split to split the stream by event ID.

The tricky bit Create multiple Datastreams by calling SplitStream.select in a for loop, and iterating over the EventIDs.

Also in the for loop, apply your windowing function to each stream.

Finally, still in the for loop, union each Datastream with the previous one (you can re-use the same variable for this)

The documentation for flink almost never defines operators in a loop, but it's perfectly legal to do so.

Here's what the guts of that for loop should look like:

DataStream<String> finalText=null;//gets rid of "might not be defined" warnings
    for (Integer i = 0; i<3; i++){
        DataStream<String> tempStream =
                splitStream.select(i.toString())
                .map(new passthroughMapFunction<String>())
                        //window function can go here
                .name("Map"+i);
        if (finalText==null){
            finalText = tempStream;
        } else {
            finalText = finalText.union(tempStream);
        }
    }
0
val aDs = all.filter(_.type == "A")
val bDs = all.filter(_.type == "B")
val cDs = all.filter(_.type == "C")

and then apply whatever you want to the different datastreams

If the filtering predicate is computation heavy you should precompute it beforehand in a map to make the type specific filters as lightweight as possible.

Tommassino
  • 111
  • 1
  • 7
  • hey, thank you for your reply. I had the same idea, but i think it is narrow. First, seeing from coding, we need to define many variables and many similar transformation code, which is a bit redundant. Second, seeing from performance, many sub dataStreams and filter functions, which take up more resources and more time. – Leyla Lee Jan 05 '18 at 09:17
  • Well, I suppose you could have a window function that returns a tuple(or a map) with the counts for each of those types and then split them after. Oh wait, nevermind, you want different sized time windows for each type. – Tommassino Jan 06 '18 at 23:09
  • Now that I thought about it since you want to apply completely different functions to the specific streams I dont see any way to do this other than what I liked up originally. If for example you wanted to window with the same window span then you wouldnt have to do this, but I dont think there is any nice demultiplexer in flink. Reference for example http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Dataset-split-demultiplex-td11647.html a good point was if the filtering predicate is computation heavy to evaluate it before the filters to make them lightweight. – Tommassino Jan 07 '18 at 10:10