1

We have PostgreSQL 9.1.3 running on (SUSE Linux) 4.3.4 - 64 bit. There is a master slave set up and a streaming replication has been set up between the two. We have a cron job set up which runs the VACUUM command on the master database every Friday. My observation is that replication breaks withing half an hour from the time VACUUM job was run.

Ther error in the postgres logs on slave is

FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 00000003000013500000001A has already been removed

Could you please help me understand the relation between the two and how can I prevent replication from breaking.

Please note I am a novice in Postgres.

  • check out `wal_keep_segments` - I believe that vacuum generates too many of them and too fast, so the slave misses wal logs to apply – Vao Tsun May 20 '16 at 08:06
  • wal_keep_segments is set to 1250 in postgresql.conf file. What is vacuum_defer_cleanup_age, do you think manipulating this can help ? – Somali Batra May 20 '16 at 09:14
  • `select 1250*16/1024` is 19GB... how big the databases are? and what is slave error? – Vao Tsun May 20 '16 at 09:26
  • The database is 170 GB. Error on slave is FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 00000003000013500000001A has already been removed – Somali Batra May 20 '16 at 11:19
  • well. it still feels like you have too little wal segments. do you keep em somewhere? if you have `show archive_command` equal to `cd` or `.` or similar - then you don't save them. in this case try increasing `wal_keep_segments` to 3000. I know it sound overhead, but you probably have very loaded oltp solution. otherwise I can't explain how `vacuum` can eat so much space. so fast... ah - another thing - do u `pg_xlog_replay_pause` on slave?.. – Vao Tsun May 20 '16 at 12:08
  • Let me try changing wal_keep_segments. I don't find pg_xlog_replay_pause on slave. – Somali Batra May 23 '16 at 06:50

0 Answers0