I am learning flink and trying to understand few concepts. Here are a few questions :
- What's a difference between
keyBy
operation on a stream and getting a source fromRichParallelSourceFunction
childs likeFlinkKinesisConsumer
? Both operations divide the stream. - Also tried to implement a very simple keyBy operator to understand it like following :
DataStream input = env.fromElements("1", "2", "3", "4", "5", "6") .keyBy((KeySelector<String, Integer>) value -> Integer.parseInt(value) % 2); DataStream parsed = input.map(new MyMapper()); DataStream parsedStr = input.map(new MyStrMapper()); parsed.print(); parsedStr.print(); env.execute("myParser");
But the output I get is baffling :
3> 1
3> 2
3> 3
3> 4
3> 5
3> 6
3> I am 1
3> I am 2
3> I am 3
3> I am 4
3> I am 5
3> I am 6
That means everything executed on subtask 3. Can someone help explain why ?