0

I am setting up an analytics pipeline using Apache Flink to process a stream of IoT data. While attempting to configure the system, I cannot seem to find any sources for how often checkpointing should be initiated? Are there any recommendations or hard-and-fast rules of thumb? e.g. 1 second, 10 seconds, 1 minutes, etc.?

EDIT: Also, is there a way of programmatically configuring the checkpoint interval at runtime?

Hegemon
  • 77
  • 10

2 Answers2

1

This depends on two things:

  • How much data are you willing to reprocess in the case of failure (The job will restarts from the last completed checkpoint)?
  • How often are you able to checkpoint due to data transfer limits and the duration of the checkpoint itself?

In my experience most users use checkpoint intervals in the order of 10 seconds, but also configure a "min-pause-between-checkpoints" [1].

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/state/checkpointing.html#enabling-and-configuring-checkpointing

snntrable
  • 901
  • 4
  • 5
1

One other thing to consider beyond what was already mentioned: if you are depending on a transactional sink for exactly-once semantics, then those transactions will be committed as part of completing each checkpoint. This means that any downstream consumers of those transactions will experience latency that is more-or-less determined by the checkpointing interval of your job.

David Anderson
  • 39,434
  • 4
  • 33
  • 60