1

Our Postgres BDR database system stopped replicating data between the nodes.

When I did a check using the pg_xlog_location_diff I noticed that there is a growing buffer in the replication slot.

SELECT slot_name, database, active, pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes
FROM pg_replication_slots
WHERE plugin = 'bdr';
                slot_name                |   database   | active | retained_bytes
-----------------------------------------+--------------+--------+----------------
 bdr_26702_6275336279642079463_1_20305__ | ourdatabase  | f      |       32253352

I also noticed that the slot is marked as active=false.

SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]+----------------------------------------
slot_name    | bdr_26702_6275336279642079463_1_20305__
plugin       | bdr
slot_type    | logical
datoid       | 26702
database     | ourdatabase
active       | f
xmin         |
catalog_xmin | 8041
restart_lsn  | 0/5F0C6C8

I increased the Postgres logging level, but then only messages I see in the log are:

LOCATION:  LogicalIncreaseRestartDecodingForSlot, logical.c:886
DEBUG:  00000: updated xmin: 1 restart: 0
LOCATION:  LogicalConfirmReceivedLocation, logical.c:958
DEBUG:  00000: failed to increase restart lsn: proposed 0/7DCE6F8, after 0/7DCE6F8, current candidate 0/7DCE6F8, current after 0/7DCE6F8, flushed up to 0/7DCE6F8

Please let me know if you have an idea how I can re-activate the replication slot and allow the replication to resume.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
postrational
  • 6,306
  • 3
  • 22
  • 27
  • Did you restart the affected node? If you have have many `xlog` files, the wal receiver process won't start until all xlogs are processed. – charli Dec 19 '16 at 13:20

1 Answers1

0

Except if you have really huuuuuge amount of data, I cannot see any reason for not recreating the replication from scratch. Stop the slave, delete the slot on master, delete data directory on slave, create new slot (with the same name to avoid further changes on slave), do pg_basebackup.

You can find a good tutorial here.

Aleksandar Pesic
  • 668
  • 7
  • 18