Apache Flink not continuing from last place read when using Debezium IO connector

Question

I am experimenting with a Flink pipeline which I've configured to read change data events from PostgreSQL database on AWS RDS. The configuration works as expected for the most part, I am able to see change data with before and after properties set properly. Problem I am experiencing is that, if I force stop the Flink pipeline and start it again, it starts to read the entire table from start to end.

Also, I experimented with Debezium IO connector without Flink (using kafka connect) and it is behaving as expected, even if I force stop the connector itself, it is able to continue reading from where left off when I start it back up. Also I noticed in Debezium, it uses offset.storage with Kafka. Wish I could have the same in Flink Debezium.

My requirement from Flink Debezium connector is for it to be able to continue from the last place it read, appreciate it if you could advice me on this as I have pretty much exhausted all my options.

The link to Flink Debezium: https://ververica.github.io/flink-cdc-connectors/master/content/connectors/postgres-cdc.html

I looked in to possibility of getting offset.storage and config.storage to be configurable in Flink Debezium connector but I couldn't find a way to do that. It appeared to me that, the connector itself overrides what I try to set at Flink initialization time.

How are you stopping and restarting Flink in the case where it doesn't resume as desired? Are you taking a savepoint, and restarting from that savepoint? — David Anderson, Aug 25 '23 at 20:56
Hi David! I am trying to see if I could do this without using a savepoint (expecting behaviour similar to Kafka read), this is because where I intend to deploy my Flink app is on AWS KDA. If the Flink app is in, let's say restarting state I would not be able to stop gracefully where KDA would take savepoint. I'd have to end up force stopping therefore I'll be left with no savepoint to restore from. So in this scenario, I'm trying to see what I can do to continue reading from where left off. I tried to see if offset storage can be externalised in the connector but there was no luck with that. — Gayal Rupasinghe, Aug 26 '23 at 06:56

Apache Flink not continuing from last place read when using Debezium IO connector

0 Answers0