I'm current setting up Debezium connected to Amazon RDS for Postgres. I did have some issue with WAL consuming huge amount of disk space.
After some research, I've set up additional heartbeat config for debezium source connector, here my configuration
{
"database.server.name": "database-source-1",
"heartbeat.interval.ms": "300000",
"heartbeat.action.query": "SELECT pg_logical_emit_message(false, 'heartbeat', now()::varchar);"
}
This solved my issue with WAL consuming disk space, the configuration add a heartbeat event which emit every 5 minutes, there also a message produced to 2 kafka topics every 5 minutes as well. Those type of topics name is similar to this format:
- __debezium-heartbeat.database-source-1
- database-source-1.message
After a few days there a lot of messages produced in these type of topic, for now they didn't consume too much disk space, but I'm afraid after a while, more message will be produced and kafka will eat quite a lot of space, since I don't have any use for these topics (apart from checking emitted heartbeat event), is there any risk clearing messages in these type of topic after sometimes? I mainly use debezium to streaming data from postgres database to another postgres database and elasticsearch