1

I'm configuring my StateTtlConfig for MapState and my interest is the objects into the state has for example 3 hours of life and then they should disappear from state and passed to the GC to be cleaned up and release some memory and the checkpoints should release some weight too I think. I had this configuration before and it seems like it was not working because the checkpoints where always growing up:

private final StateTtlConfig ttlConfig = StateTtlConfig.newBuilder(org.apache.flink.api.common.time.Time.hours(3)).cleanupFullSnapshot().build();

Then I realized the that configuration works only when reading states from a savepoints but not in my scenario. I'd change my TTL configuration to this one:

private final StateTtlConfig ttlConfig = StateTtlConfig.newBuilder(org.apache.flink.api.common.time.Time.hours(3))
            .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired).build();

Based on the idea that I want to clean all the states for all keys after a defined time.

My questions are:

  1. I'm I doing the right configuration right now?
  2. What is the best way to do it?

Thanks one more time. Kind regards!!!

Alter
  • 903
  • 1
  • 11
  • 27
  • I'm using FSStateBackend – Alter Aug 24 '20 at 15:34
  • What version of Flink are you using? The state expiry mechanism has evolved/matured over the past several releases. The behavior is also different between the two state backends, so it would help to know whether or you are using the RocksDBStateBackend or the FsStateBackend. – David Anderson Aug 24 '20 at 15:36
  • I'm using Flink 1.10 and upgrading to 1.11 right now, so I will be using Flink 1.11 starting from today with FsStateBackend. I just need that Flink releases all the states of all keys that has expired after a defined time. What kind of configuration do you think I should apply in my case?Thanks David. – Alter Aug 24 '20 at 15:43
  • Should I use this approach: StateTtlConfig ttlConfig = StateTtlConfig .newBuilder(Time.days(7)) .cleanupIncrementally(10, false) .build(); – Alter Aug 24 '20 at 15:49

1 Answers1

1

I don't know enough about your use case to recommend a specific expiration/cleanup policy, but I can offer a few notes.

My understanding is that cleanupFullSnapshot() specifies that in addition to whatever other cleanup is being done, a full cleanup will be done whenever taking a snapshot.

The FsStateBackend uses the incremental cleanup strategy. By default it checks 5 entries during each state access, and does no additional cleanup during record processing. If your workload is such that there are many more writes than reads, that might not be enough. If no access happens to the state, expired state will persist. Choosing cleanupIncrementally(10, false) will make the cleanup more aggressive, assuming you do have some level of state access going on.

It's not unusual for checkpoint sizes to grow, or to take longer than you'd expect to reach a plateau. Could it simply be that the keyspace is growing?

https://flink.apache.org/2019/05/19/state-ttl.html is a good resource for learning more about Flink's State TTL mechanism.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • Let's assume that I have a MapState, which contains the state for previous objects. this object (or state) will try to be checked with every incoming event to see whether it has a previous state or not, if the new event has a previous state, this state will be readed, compared with the new event, updated and write back to the state with the update, but if an object into the state has expired, then I need to remove the reference from the memory and send it to GC as Flink page said, to release some memory and checkpoint (state) size I think. To made this I was using `cleanupFullSnapshot()` – Alter Aug 25 '20 at 14:40
  • but looks like it wasn't working because checkpoints never stop growing within a short time, now I'd change it to `StateTtlConfig ttlConfigRefProfiles = StateTtlConfig.newBuilder(Time.minutes(1)) .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite) .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired).build();` and in somehow seems to work, checkpoints are more light and the memory/CPU consumption are less than before. Assuming this information and hopping is enough, I'm understanding it right now? Thanks – Alter Aug 25 '20 at 14:41
  • If an object in state has expired, there's nothing you need to do. It things are configured correctly, it will be removed and GC'ed for you. – David Anderson Aug 25 '20 at 16:01
  • If you were setting Time.days(7) as the TTL, then you'd have to wait a week before the checkpoints would be affected. With Time.minutes(1) that will, of course, happen much sooner. – David Anderson Aug 25 '20 at 16:03
  • So, you're saying that I don't need to apply any other configuration but `StateTtlConfig ttlConfigRefProfiles = StateTtlConfig.newBuilder(Time.minutes(1)) .build();` and it will be okay to remove the old states when the checkpoint will be affected at the time defined? Thanks a lot. – Alter Aug 25 '20 at 16:16
  • Not exactly, no. `OnCreateAndWrite` and `NeverReturnExpired` are the defaults, so setting those isn't necessary, but does help make your choices explicit. However, there's no guarantee that the checkpoint will be affected in a timely fashion with the default settings. You should set `cleanupFullSnapshot()` if you want the checkpoints not include state that should have been expired. And you may need to configure `cleanupIncrementally` if the default settings don't expire old state quickly enough. – David Anderson Aug 25 '20 at 17:17