2

I have a Flink Cluster. I enabled the compaction filter and using state TTL. but Rocksdb Compaction Filter does not free states from memory.

I have about 300 record / s in my Flink Pipeline

My state TTL config:

@Override
public void open(Configuration parameters) throws Exception {
    ListStateDescriptor<ObjectNode> descriptor = new ListStateDescriptor<ObjectNode>(
            "my-state",
            TypeInformation.of(new TypeHint<ObjectNode>() {})
    );


    StateTtlConfig ttlConfig = StateTtlConfig
            .newBuilder(Time.seconds(600))
            .cleanupInRocksdbCompactFilter(2)
            .build();

    descriptor.enableTimeToLive(ttlConfig);

    myState = getRuntimeContext().getListState(descriptor);
}

flink-conf.yaml:

state.backend: rocksdb
state.backend.rocksdb.ttl.compaction.filter.enabled: true
state.backend.rocksdb.block.blocksize: 16kb
state.backend.rocksdb.compaction.level.use-dynamic-size: true
state.backend.rocksdb.thread.num: 4
state.checkpoints.dir: file:///opt/flink/checkpoint
state.backend.rocksdb.timer-service.factory: rocksdb
state.backend.rocksdb.checkpoint.transfer.thread.num: 2
state.backend.local-recovery: true
state.backend.rocksdb.localdir: /opt/flink/rocksdb
jobmanager.execution.failover-strategy: region
rest.port: 8081
state.backend.rocksdb.memory.managed: true
# state.backend.rocksdb.memory.fixed-per-slot: 20mb
state.backend.rocksdb.memory.write-buffer-ratio: 0.9
state.backend.rocksdb.memory.high-prio-pool-ratio: 0.1
taskmanager.memory.managed.fraction: 0.6
taskmanager.memory.network.fraction: 0.1
taskmanager.memory.network.min: 500mb
taskmanager.memory.network.max: 700mb
taskmanager.memory.process.size: 5500mb
taskmanager.memory.task.off-heap.size: 800mb

metrics.reporter.influxdb.class: org.apache.flink.metrics.influxdb.InfluxdbReporter
metrics.reporter.influxdb.host: ####
metrics.reporter.influxdb.port: 8086
metrics.reporter.influxdb.db: ####
metrics.reporter.influxdb.username: ####
metrics.reporter.influxdb.password: ####
metrics.reporter.influxdb.consistency: ANY
metrics.reporter.influxdb.connectTimeout: 60000
metrics.reporter.influxdb.writeTimeout: 60000

state.backend.rocksdb.metrics.estimate-num-keys: true
state.backend.rocksdb.metrics.num-running-compactions: true
state.backend.rocksdb.metrics.background-errors: true
state.backend.rocksdb.metrics.block-cache-capacity: true
state.backend.rocksdb.metrics.block-cache-pinned-usage: true
state.backend.rocksdb.metrics.block-cache-usage: true
state.backend.rocksdb.metrics.compaction-pending: true

Monitoring by Influxdb and Grafana:

enter image description here

Mohammad Hossein Gerami
  • 1,360
  • 1
  • 10
  • 26

1 Answers1

2

As the name of this TTL cleanup implies (cleanupInRocksdbCompactFilter), it relies on the custom RocksDB compaction filter which runs only during compactions. More details in docs.

The metrics in the screenshot show that there have been no running compactions all the time. I suppose that the size of data is just not big enough to start any compaction at this point of time.

Compaction Filter does not free states from memory.

I assume that the main RAM memory is meant by saying 'from memory'. If so, the compaction is not running there at all. The size of data, kept by RocksDB in main memory, is always limited. It is basically a cache and the expired untouched state should just get evicted from it eventually. The rest is periodically spilled to disk and gets compacted over time. This is when this TTL cleanup is supposed to remove the expired state from the system.

azagrebin
  • 390
  • 1
  • 8
  • thanks for your response. can I force to running compaction filter by state.backend.rocksdb.compaction.level.max-size-level-base and state.backend.rocksdb.compaction.level.target-file-size-base configuration? – Mohammad Hossein Gerami Mar 11 '20 at 18:23
  • Memory usage all the time increasing and I don't know the reason for it. How can I remove the states automatically from memory? – Mohammad Hossein Gerami Mar 11 '20 at 20:41
  • Yes, you can tune the compaction triggering by tuning target size of levels. – azagrebin Mar 12 '20 at 17:53
  • In general, the size of main memory used by RocksDB should not grow indefinitely. There are limits for memtables (https://github.com/facebook/rocksdb/wiki/MemTable) and caches which are configurable directly by RocksDB custom options. It is indeed not simple. If you use Flink 1.10 then Flink tries to configure memory usage (https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/state_backends.html#memory-management) of RocksDB to limit it according to the Flink managed memory (https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_setup.html#managed-memory) – azagrebin Mar 12 '20 at 18:00
  • At the moment, there is no automatic way in Flink to cleanup expired state directly in memtables for RocksDB. The idea is that it grows to its limits and then cleanup happens during compactions on disk to keep its occupied space limited by actual data size. – azagrebin Mar 12 '20 at 18:08