0

I have an Apache Beam application deployed on Amazon KDA.

It has checkpointing enabled with the default settings.

"FlinkApplicationConfigurationDescription": {
"CheckpointConfigurationDescription": {
"ConfigurationType": "DEFAULT",
"CheckpointingEnabled": true,
"CheckpointInterval": 60000,
"MinPauseBetweenCheckpoints": 5000
},

But in the application logs I could see:

"UnboundedSources present which rely on checkpointing, but checkpointing is disabled."

It only checkpoints if I pass CheckpointInterval as a runtime property to my application. So is it necessary to pass these values explicitly?

The application basically reads from Kinesis, window data into a fixed duration of size ~ 30s, then publish data back to PubSub.

   pipeline
            .apply("Read from Kinesis",  new KinesisIORead())
            .apply("Windowing", Window.into(FixedWindows.of(Duration.standardSeconds(30))))
            .apply(WithKeys.of(DUMMY_KEY))
            .apply(GroupIntoBatches.ofSize(5))
            .apply(Values.create())
            .apply("Map values to single object", ParDo.of(new GroupedMessage()))
            .apply("Write to Pub/Sub", new PubSubWrite()));

The application jar includes:

  • beam-sdks-java-core:2.31.0
  • beam-runners-flink-1.11:2.31.0
  • beam-sdks-java-io-kafka:2.31.0
Gayan Weerakutti
  • 11,904
  • 2
  • 71
  • 68

1 Answers1

0

It seems that org.apache.flink.streaming.api.environment.StreamExecutionEnvironment is not picking up changes set in the AWS UI, and hence when calling getCheckpointConfig().isCheckpointingEnabled() it claims that checkpointing is not enabled. I would pass these explicitly to make sure it's working correctly.

robertwb
  • 4,891
  • 18
  • 21