Apache Flink - How Checkpoint/Savepoint works If we run duplicate jobs (Multi Tenancy)

Question

I have multiple Kafka topics (multi tenancy) and I run the same job run multiple times based on the number of topics with each job consuming messages from one topic. I have configured file system as state backend.

Assume there are 3 jobs running. How does checkpoints work here? Does all the 3 jobs store the checkpoint information in the same path? If any of the job fails, how does the job knows from where to recover the checkpoint information? We used to give a job name while submitting a job to the flink cluster. Does it have anything to do with it? In general how does Flink differentiate the jobs and its checkpoint information to restore in case of failures or manual restart of the jobs (irrespective of same or different jobs)?

Case1: What happens in case of job failure?

Case2: What happens If we manually restart the job?

Thank you

score 1 · Answer 1 · answered Jul 16 '20 at 19:35

1

To follow-on to what @ShemTov was saying:

Each job will write its checkpoints in a sub-dir named with its jobId.

If you manually cancel a job the checkpoints are deleted (since they are no longer needed for recover), unless they have been configured to be retained:

CheckpointConfig config = env.getCheckpointConfig();
config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

Retained checkpoints can be used for manually restarting, and for rescaling.

Docs on retained checkpoints.

If you have high availability configured, the job manager's metadata about checkpoints will be stored in the HA store, so that recovery does not depend on the job manager's survival.

answered Jul 16 '20 at 19:35

David Anderson

39,434
4
33
60

I cancelled the job using flink cancel {{job_id}} and I noticed that the checkpoint directory is getting deleted which is fine. I have enabled checkpointing and have not configured external checkpoint cleanup option. Now the job is down and fired a few packets to the Kafka queue which the Flink job is pointing to. When I submit the job, it is able to read the messages (fired when job is down) from the queue. As per my understanding the job is not supposed to use checkpoint when submitting new job. Is this behavior valid? Is there anything to do with the offsets stored in Kafka/ZooKeeper? – Raghavendar Aug 27 '20 at 10:22
Checkpoints can be used when submitting a new job, but only explicitly -- this can not happen by surprise. But yes, Flink commits the offsets back to Kafka/ZooKeeper whenever a checkpoint is completed, and when a new job starts, the default start position is `setStartFromGroupOffsets`. Read the [docs](https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html) for more details. – David Anderson Aug 27 '20 at 11:03
For testing, I added flinkKafkaConsumer.setCommitOffsetsOnCheckpoints(false); so that the offsets are not committed to Kafka/ZooKeeper so that it always has old offset. Also I did not add config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION); When I cancel(no args) and start(no args) a job, the job is able to pickup the messages from where if left. But Flink is supposed to get the last committed offset (old offset) from Kafka/ZooKeeper. As per the documentation it is not supposed to work this way. It works like as If the job uses checkpoints. – Raghavendar Aug 28 '20 at 11:17
You have mentioned a point about HA. Still the job would not use checkpoint unless explicitly specified. Am I right? I am just submitting a new job each time and the job is picking up from where it left which is not the expected behaviour. – Raghavendar Aug 28 '20 at 11:27
Checkpoints are used automatically to recover from failures, but when manually starting a job they are only used if explicitly asked for. Furthermore, checkpoints are deleted when a job is canceled, unless you set RETAIN_ON_CANCELLATION. – David Anderson Aug 28 '20 at 12:40
Do you perhaps have enable.auto.commit set? – David Anderson Aug 28 '20 at 12:42
1

I think I got the answer. I am using FlinkKafkaConsumer which internally reads the offsets from Kafka/ZooKeeper when the job is started which is the default behaviour. So in this case checkpoints will only be used when the job crashes in the cluster and restarted by Flink. If we stop the job and start again, the offset is read from Kafka/ZooKeeper. – Raghavendar Aug 28 '20 at 12:49

score 0 · Accepted Answer · answered Jul 16 '20 at 13:35

The JobManager is aware of each job checkpoint, and keep that metadata, checkpoint is being save to the checkpoint directory(via flink-conf.yaml), under this directory it`ll create a randomly hash directory for each checkpoint.

Case 1: The Job will restart (depend on your Fallback Strategy...), and if checkpoint is enabled it'll read the last checkpoint.

Case 2: Im not 100% sure, but i think if you cancel the job manually and then submit it, it won't read the checkpoint. You'll need to use savepoint. (You can kill your job with savepoint, and then submit your job again with the same savepoint). Just be sure that every oprator has a UID. you can read more about savepoints here: https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html

Apache Flink - How Checkpoint/Savepoint works If we run duplicate jobs (Multi Tenancy)

2 Answers2