I struggle to find a solution for cleaning the checkpoint state files whose number grows overtime after I start a spark stateful structured streaming which ends up take up a lot of disk space. When saying checkpoint state file I mean the delta and snapshot files in the state directory. I checked the config spark.cleaner.referenceTracking.cleanCheckpoints
, but it seems that it only deals with RDD checkpointing and has nothing to do with the state.
I wonder if there's a similar configuration that deals with the state files in structured streaming, cause to my knowledge some older delta and snapshot files are pretty useless since there is newer snapshot file which holds all current state. Please correct if I'm wrong.