Postgres HA - Warm standby server lagging

Question

We have recently implemented high availability for our postgres (9.0.4) DB server, through the methods described as Log-Shipping Standby Servers in the Postgres documentation. Everything seems to be fine and working, the WAL files are shipping and are being ingested by the standby server, but we are experience lagging between the master and slave machines. The lag is of about 2 hours which is not really acceptable.

What could be the reason for this lag? The machine is not running anything else but the postgres server, although it does use slower hard drives compared to the production server. How can I check if disk I/O is causing issues?

If I check what processes are running on the server I see a constant battle between the postgres startup process which is recovering newest WAL files and the pg_standby utility which is ingesting the archived WALs step-by-step. Is it OK that the startup process is running constantly?

ps example:

postgres  1422  0.0  1.0 13061220 131568 ?     S    Sep20   0:01 /usr/pgsql-9.0/bin/postmaster -p 5433 -D /data/pgsql_5433/data
postgres  1431  0.0  0.0 176928   512 ?        Ss   Sep20   0:12 postgres: logger process
postgres  1432 70.5 72.0 13068604 8775544 ?    Ss   Sep20 5744:15 postgres: startup process   waiting for 000000010000181F00000016
postgres  1437  0.2 70.4 13068336 8582736 ?    Ss   Sep20  22:50 postgres: writer process
postgres 32199  0.0  0.0   4064   484 ?        S    01:46   0:00 /usr/pgsql-9.0/bin/pg_standby -l -t/data/pgsql_5433/trigger /data/pgsql_5433/psql_wal_import 000000010000181F00000016 pg_xlog/RECOVERYXLOG 000000010000181E00000051

I would appreciate any hint ...

Latest patch is version 9.0.13, you're 7 patches/2 years behind. Check the release notes to see if there is something related to replication. — Frank Heikens, Sep 26 '13 at 05:36

score 1 · Answer 1 · answered Sep 26 '13 at 14:16

Most likely your WAL takes a long time to fill up. You can adjust the timeout to force it to switch before it's full. This will increase network traffic significantly, but will give you a max time before the log is sent over. You can check the documentation here.

Postgres HA - Warm standby server lagging

1 Answers1