We are using Kinesis for Apache Flink to analyze various visitor events from multiple sources. In one of the operators, we are using a MapSate for cumulative metrics calculation. Flink application was auto-scaled 4 times during one-week execution. The problem is that each time it auto-scaled operator state was completely dropped. There are no error messages in logs, except - " RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested." from TaskManagerRunner.
The job uses the following configuration: Checkpoint configuration is using DEFAULT mode and is enabled. Application auto-scaling is enabled. Application restore configuration - Update without snapshot. State does not use TTL.
Is my understanding correct that if we need to persist state after auto-scaling we should start a job with RESTORE_FROM_LATEST_SNAPSHOT configuration? I thought that this value is needed only for full application restarts. Is there anything else that could cause a similar problem?