How does Flink scale for hot partitions?

Question

If I have a use case where I need to join two streams or aggregate some kind of metrics from a single stream, and I use keyed streams to partition the events, how does Flink handle the operations for hot partitions where the data might not fit into memory and needs to be split across partitions?

https://stackoverflow.com/a/64205415/2000823 explains part of the answer, and https://stackoverflow.com/a/58645723/2000823 should help as well. — David Anderson, Feb 19 '21 at 18:32
Thanks for the useful links. But I still don't understand how hot partitions are handled. Let's one keyGroup gets a lot more events for a key in that group. Will Flink split that partition to create new groups that can fit the events in memory? — Sairam Sankaran, Feb 19 '21 at 20:31

score 2 · Accepted Answer · answered Feb 20 '21 at 07:46

Flink doesn't do anything automatic regarding hot partitions.

If you have a consistently hot partition, you can manually split it and pre-aggregate the splits.

If your concern is about avoiding out-of-memory errors due to unexpected load spikes for one partition, you can use a state backend that spills to disk.

If you want more dynamic data routing / partitioning, look at the Stateful Functions API or the Dynamic Data Routing section of this blog post.

If you want auto-scaling, see Autoscaling Apache Flink with Ververica Platform Autopilot.

How does Flink scale for hot partitions?

1 Answers1

Linked