3

I have a flink job (scala) that is basically reading from a kafka-topic (1.0), aggregating data (1 minute event time tumbling window, using a fold function, which I know is deprecated, but is easier to implement than an aggregate function), and writing the result to 2 different kafka topics.

The question is - when I'm using a FS state backend, everything runs smoothly, checkpoints are taking 1-2 seconds, with an average state size of 200 mb - that is, until the state size is increasing (while closing a gap, for example).

I figured I would try rocksdb (over hdfs) for checkpoints - but the throughput is SIGNIFICANTLY less than fs state backend. As I understand it, flink does not need to ser/deserialize for every state access when using fs state backend, because the state is kept in memory (heap), rocks db DOES, and I guess that is what is accounting for the slowdown (and backpressure, and checkpoints take MUCH longer, sometimes timeout after 10 minutes).

Still, there are times that the state cannot fit in memory, and I am trying to figure out basically how to make rocksdb state backend perform "better".

Is it because of the deprecated fold function? Do I need to fine tune some parameters that are not easily searchable in documentation? any tips?

OmriManor
  • 255
  • 4
  • 14
  • 1
    When you say "rocksdb (over hdfs)", do you mean that hdfs is used being used for the working state, or for the checkpoints? Have you tried incremental checkpointing? Can you give more memory to the RocksDB cache? – David Anderson Nov 11 '18 at 16:18
  • I did not know that there was a difference, basically I am instantiating a rocksdb state backend with `namenode:port///checkpoints` URI. If there is a difference between checkpoints and working state, how do I define it? ; I am working with incremental checkpointing, I can try to give more memory, can you give me a pointer to how to go about doing that? – OmriManor Nov 11 '18 at 16:58

1 Answers1

1

Each state backend holds the working state somewhere, and then durably persists its checkpoints in a distributed filesystem. The RocksDB state backend holds its working state on disk, and this can be a local disk, hopefully faster than hdfs.

Try setting state.backend.rocksdb.localdir (see https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/state/state_backends.html#rocksdb-state-backend-config-options) to somewhere on the fastest local filesystem on each taskmanager.

Turning on incremental checkpointing could also make a large difference.

Also see Tuning RocksDB.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • I'll try setting state.backend.rocksdb.localdir to a local directory tomorrow and share the results, and I am already using incremental checkpointing. – OmriManor Nov 11 '18 at 17:17