Flink Incremental CheckPointing Compaction

Question

We have a forever running flink job which reads from kafka , creates sliding time windows with (stream intervals :1hr , 2 hr to 24 hr) and (slide intervals : 1 min , 10 min to 1 hours). basically its : KafkaSource.keyBy(keyId).SlidingWindow(stream, slide).reduce.sink

I have enabled the check-pointing recently with rocksDB back-end and incremental=true and with hdfs persistent storage.

From last 4/5 days I m monitoring the job and its running fine but I am concerned about the check-point size. As rocksDB does compaction & merging, size is not forever growing but still it grows and till now has reached 100 gb.

So, what is the best way to check-point forever running jobs ?

It will have millions of unique keyId. so, will there be one state per key for each operator while check-pointing ?

score 0 · Answer 1 · answered Nov 15 '22 at 03:26

If the total amount of your keys is under control, you don't need to worry about the growing of the size of checkpoints, which means it'll be convergent eventually.

If you still want to cut the size of checkpoint, you can set TTL for you state if your state can be regarded as expired that not being operated for a period of time.

Flink state is associated with key-group, which means a group of keys. Key-group is the unit of flink state. Each key's state will be included in a completed checkpoint. However with the incremental mode, some checkpoints will share .sst files, so you can see the checkpointed size is not that large as the total checkpoint size. If some keys are not updated between the last checkpoint interval, these keys' state won't be uploaded this time.

Flink Incremental CheckPointing Compaction

1 Answers1