1

You can have see these problem

  1. https://www.reddit.com/r/aws/comments/7snob5/postgresql_transaction_logs_fill_up_storage_till/

  2. https://dba.stackexchange.com/questions/173267/aws-rds-postgres-logical-replication?rq=1

And my problem may be a little different. I use https://github.com/jiamo/python-psql-replication to do repliaction from postgres to es when there is data change. When there is data change the Transaction Logs is fine. But when there is no data change any more the Transactioin logs keep going like this:

enter image description here

09/23 - 09/25 it is weekend (So there is not data change any more and the transaction logs keep going)

I have some trick and want to solve this problem by update a little data in crontab. But when the data changed the transaction logs seem don't going down immediately (Which need more time to verify this method)

My questions at now: Can anybody explain the phenomenon and offer a more better way to solve it?

Add more about what the trick is.

=> SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
CD/1C0005C0
(1 row)

After two hours:

=> SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
CD/41000410

When there is no inserts/updates/deletes activity. The pg_current_wal_lsn is still keeping increase.

The pg slot is something like this

=> select * from pg_replication_slots;
slot_name   |  plugin  | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
--------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
wal2json_rds | wal2json | logical   |  16400 | test | f         | t      |      11270 |      |       593776 | CD/140235B8 | CD/140235B8

The confirmed_flush_lsn is too smaller than the pg_current_wal_lsn.

The trick ( Here is the log), I update a simple row using crontab, But after that no data change any more. And the confirmed_flush_lsn was keeping at CD/140235B8:

host:25a5743b67db time:2018-10-01 09:23:21.865489 before sleep wal_end 875955403896 hex CB/F302BC78 next_lsn 876123392024 hex CB/FD060818
host:25a5743b67db time:2018-10-01 09:23:32.087501 before sleep wal_end 875955403896 hex CB/F302BC78 next_lsn 876123392024 hex CB/FD060818
host:25a5743b67db time:2018-10-01 09:23:38.705198 future len tmp_list is 1
host:25a5743b67db time:2018-10-01 09:23:38.763092 success bulk 1
host:25a5743b67db time:2018-10-01 09:23:38.763327 queue 0 getters 1 putters 0
host:25a5743b67db time:2018-10-01 09:23:38.763503 queue begin to wait......
host:25a5743b67db time:2018-10-01 09:23:42.310556 before sleep wal_end 880803984024 hex CD/14023298 next_lsn 880803984824 hex CD/140235B8
host:25a5743b67db time:2018-10-01 09:23:52.531998 before sleep wal_end 880803984024 hex CD/14023298 next_lsn 880803984824 hex CD/140235B8

What my trick want to do is increase the confirmed_flush_lsn when no another data change (by update a row's status column). But it seem it increase the value but don't make RDS to delete some transaction logs.

-------- update ------------

I change the trick to change two row instead one row (mean have a little more data change) this time the transaction log can be reduce but still not reduce too much like this png. In weekend my trick works which reduce the transaction log size. But not like the normal day when there is more data change and the size can be reduce to 0. enter image description here

jiamo
  • 1,406
  • 1
  • 17
  • 29
  • Because this is not the correct site for this question. This should be asked on https://dba.stackexchange.com/ –  Oct 01 '18 at 13:25
  • 1
    I am not think so. It is not just something like a dba problem. The one reason I can give is I can tag it in this site. – jiamo Oct 01 '18 at 13:27
  • It might not be a "DBA problem" but it's certainly a problem related to "database administration". Stackoverflow is a programming related site and your question is not related to programming at all - unless you suspect the reason for this in your application (i.e. your code). In that case you need to supply the code that you think is responsible for this question. Btw: while I agree with the close reason, I don't think your question deserved to be downvoted. –  Oct 01 '18 at 13:31
  • I am not want to argue something. You can see the related problems in this page. Does every problem in stackover flow is relating to code? – jiamo Oct 01 '18 at 13:39
  • @jiamo were you able to resolve this issue? – Nick Borodulin Jan 22 '19 at 21:01

1 Answers1

1

In CDC mode, AWS's Data Migration Service (DMS) creates a replication slot and subscribes to it, much like what you're doing. It includes an option to send regular dummy queries to make sure the replication slot location advances. I'm not sure precisely how it's implemented, but I am sure it will point you in the right direction. Search for "WAL heartbeat" in the DMS release notes.

Jeff Waugh
  • 116
  • 3