2

I am a newbie to Apache Flink, and I was going through the Apache Flink's examples. I found that in case of a failure Flink has the ability to restore stream processing from a checkpoint.

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(10000L);

Now, my question is where does Flink keep the checkpoint(s) by default?

Any help is appreciated!

himanshuIIITian
  • 5,985
  • 6
  • 50
  • 70

2 Answers2

5

Flink features the abstraction of StateBackends. A StateBackend is responsible to locally manage the state on the worker node but also to checkpoint (and restore) the state to a remote location.

The default StateBackend is the MemoryStateBackend. It maintains the state on the workers' (TaskManagers') JVM heap and checkpoints it to the JVM heap of the master (JobManager). Hence the MemoryStateBackend does not require any additional configuration or external system and is good for local development. However, it is obviously not scalable and suited for any serious workload.

Flink also provides a FSStateBackend, which holds holds local state also on the workers' JVM heap and checkpoints it to a remote file system (HDFS, NFS, ...). Finally, there's also the RocksDBStateBackend, which stores state in an embedded disk-based key-value store (RocksDB) and also checkpoints to a remote file system (HDFS, NFS, ...).

Fabian Hueske
  • 18,707
  • 2
  • 44
  • 49
  • Thanks for the response. But what I meant was - When I kill the Flink app (any one of the example) and restart it, It is able to recover from the last message processed. So, when only one JVM is up and it goes down then how in-mem checkpoint works? It has to save something on disk, I think. – himanshuIIITian May 14 '18 at 10:44
  • 1
    What source are you referring to? If you use Kafka it is probably due to offsets committed to kafka brokers, rather than due to checkpoints being written. – Dawid Wysakowicz May 14 '18 at 12:36
  • @DawidWysakowicz Yes, I am referring to Kafka as a source. – himanshuIIITian May 14 '18 at 16:53
  • The checkpoints also include the Kafka read offsets (Flink is not relying on Kafka's own offset commit mechanism). As I said, checkpoints are stored on the JobManager's (master's) heap. If that process goes down, the checkpoint and all state is lost. Hence, the default configuration is not recommended for production use cases and you should configure one of the other two state backends. – Fabian Hueske May 15 '18 at 07:09
2

Default state back-end is MemoryStateBackend. Means it stores the in flight data in Task manager's JVM and checkpoint it in heap of master(job manger). its good for local debugging but you will loose your checkpoints if job goes down.

Usually for production use FsStateBackend with path to external file systems (HDFS,S3 etc). It stores in flight data in Task manager's JVM and checkpoint it to external file system.

like

env.setStateBackend(new FsStateBackend("file:///apps/flink/checkpoint"));

Optionally a small meta file can also be configured pointing to the state store for High availability.

Mudit bhaintwal
  • 528
  • 1
  • 7
  • 21