I know that Flink comes with custom partitioning APIs. However, the problem is that, after invoking partitionCustom
on a DataStream
you get a DataStream
back and not a KeyedStream
.
On the other hand, you cannot override the partitioning strategy for a KeyedStream
.
I do want to use KeyedStream
, because the API for DataStream
does not have reduce
and sum
operators and because of automatically partitioned internal state.
I mean, if the word count is:
words.map(s -> Tuple2.of(s, 1)).keyBy(0).sum(1)
I wish I could write:
words.map(s -> Tuple2.of(s, 1)).partitionCustom(myPartitioner, 0).sum(1)
Is there any way to accomplish this?
Thank you!