Flink AT_LEAST_ONCE checkpoint uses 100% managed memory

Question

We have a Flink streaming job v1.14 running in native K8S deployment mode. When we use AT_LEAST_ONCE checkpoint mode, the managed memory usage hits 100% no matter how many memory we assigned to it. Any ideas what might be the cause or is this actually an expected behavior how Flink manages memory?

When streaming, managed memory is normally only used for RocksDB. What configuration settings have you put in place, beyond the defaults? Are you running in batch execution mode? Is managed memory consumption different if you use EXACTLY_ONCE checkpointing? — David Anderson, Nov 23 '21 at 18:51
Here are some configs we setup for rocksDB backend. The managed memory consumption is the same when we use EAXCTLY_ONCE checkpointing. Sry about the pool formatting. state.backend: rocksdb state.backend.incremental: true state.checkpoints.dir: s3://xxx state.checkpoints.num-retained: 3 — 周天钜, Nov 23 '21 at 19:25
Are you experiencing problems, such as out-of-memory errors? — David Anderson, Nov 24 '21 at 10:07
No, everything works well. Most likely this is how rocksDB uses Flink managed memory based on Flink v1.14 official documentation. https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#state-backend-rocksdb-memory-managed" But it's quite like a black box, and we don't know exactly how much managed memory is used in this case which makes it difficult to tune Flink memory usage. If we turn off this setting, basically means we need to take care of memory over write buffers, indexes, block caches. This is also something we don't want to do in Prod environment. — 周天钜, Nov 24 '21 at 20:53
It is generally not recommended to attempt detailed tuning of Flink memory usage. And yes, RocksDB will try to take full advantage of the memory made available to it. — David Anderson, Nov 24 '21 at 21:15

Flink AT_LEAST_ONCE checkpoint uses 100% managed memory

0 Answers0