Is it possible to create a KeyedStream
from a pre-sharded/pre-partitioned Kinesis Data Stream without the need for a network shuffle (i.e. using reinterpretAsKeyedStream
or something similar)?
- If that is not possible (i.e. the only reliable is to consume from Kinesis and then use
keyBy
), then is network shuffling at least minimized by doing akeyBy
on a the field that the source is sharded by (e.g.env.addSource(source).keyBy(pojo -> pojo.getTransactionId())
, where the source is a kinesis data stream that is sharded bytransactionId
) - If the above is possible, what are the limitations?
What I've Learned so Far
- The functionality I am describing is already implemented by
reinterpretAsKeyedStream
, but this feature is experimental and seems to have significant drawbacks (as per discussions in the stackoverflow posts below) - In addition to the above, all the discussions related to
reinterpretAsKeyedStream
that I've found are in the context of Kafka, so I'm not sure how the outcomes differ for a Kinesis Data Stream
Context of my Application
- Re. configurations: both the Kinesis Data Stream and Flink will be hosted serverlessly, and automatically scale up/down depending on load (which as I understand it, means that
reinterpretAsKeyedStream
cannot be used)
Any help/insight is much appreciated, thanks!