Flink Stateful Functions using remote functions involve a Flink StateFun cluster handing off execution of compute tasks to remote workers deployed through some FaaS mechanism, for instance AWS Lambda.
AWS lambdas are subject to scaling limitations (how fast they can scale and to what limit), as described in the docs. Note that the Lambda quotas on concurrency apply to the account, not to each individual lambda function.
In a large scale streaming system, particularly if the work being performed by an individual Lambda invocation is relatively lengthy with respect to the number of keys encountered in the datastream in that period, it is conceivable that the Flink StateFun cluster could encounter Lambda throttling events. In other words, the StateFun cluster when trying to invoke a Lambda through an API Gateway would receive a 429 error from API Gateway because the number of simultaneous Lambda invocations is at the limit.
How does Flink handle this? Does it implement backoff/retry and how is this handled with respect to ordering of events in the data stream?