I have an Apache Beam application deployed on Amazon KDA.
It has checkpointing enabled with the default settings.
"FlinkApplicationConfigurationDescription": {
"CheckpointConfigurationDescription": {
"ConfigurationType": "DEFAULT",
"CheckpointingEnabled": true,
"CheckpointInterval": 60000,
"MinPauseBetweenCheckpoints": 5000
},
But in the application logs I could see:
"UnboundedSources present which rely on checkpointing, but checkpointing is disabled."
It only checkpoints if I pass CheckpointInterval
as a runtime property to my application. So is it necessary to pass these values explicitly?
The application basically reads from Kinesis, window data into a fixed duration of size ~ 30s, then publish data back to PubSub.
pipeline
.apply("Read from Kinesis", new KinesisIORead())
.apply("Windowing", Window.into(FixedWindows.of(Duration.standardSeconds(30))))
.apply(WithKeys.of(DUMMY_KEY))
.apply(GroupIntoBatches.ofSize(5))
.apply(Values.create())
.apply("Map values to single object", ParDo.of(new GroupedMessage()))
.apply("Write to Pub/Sub", new PubSubWrite()));
The application jar includes:
- beam-sdks-java-core:2.31.0
- beam-runners-flink-1.11:2.31.0
- beam-sdks-java-io-kafka:2.31.0