We have a forever running flink job which reads from kafka , creates sliding time windows with (stream intervals :1hr , 2 hr to 24 hr) and (slide intervals : 1 min , 10 min to 1 hours). basically its : KafkaSource.keyBy(keyId).SlidingWindow(stream, slide).reduce.sink
I have enabled the check-pointing recently with rocksDB back-end and incremental=true and with hdfs persistent storage.
From last 4/5 days I m monitoring the job and its running fine but I am concerned about the check-point size. As rocksDB does compaction & merging, size is not forever growing but still it grows and till now has reached 100 gb.
So, what is the best way to check-point forever running jobs ?
It will have millions of unique keyId. so, will there be one state per key for each operator while check-pointing ?