0

I have a site with nginx, php-fpm, postgresql and pgbouncer and it was working for over a year now. Yesterday server went into not responding and after reboot it worked for 5 or 10 minutes and after that not responding and the same error message:

2017/04/04 15:32:37 [error] 2532#0: *31341 FastCGI sent in stderr: "PHP message: PHP Warning: pg_connect(): Unable to connect to PostgreSQL server: could not connect to server: Cannot assign requested address Is the server running on host "127.0.0.1" and accepting TCP/IP connections on port 6432? in /usr/share/nginx/html/lib/postgresql.class.php on line 21

I even reverted to last working config and it's still the same (when I start the nginx, it works for a minute or two an then above error).

I checked php-fpm logs and I encountered following:

[04-Apr-2017 14:48:50] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 24 total children

and

[04-Apr-2017 14:48:59] WARNING: [pool www] server reached pm.max_children setting (50), consider raising it

and in postgresql logs I found this:

HINT: Consider increasing the configuration parameter "max_wal_size".

I've changed following parameters so far

In php-fpm:

pm.max_childeren from 50 to 100
pm.start_servers from 5 to 10
pm-min_spare_servers from 5 to 10
pm-max_spare_servers from 35 to 100

In Postgresql:

mac_wal_size from 1GB to 2GB

And still no luck!!! what should I do?

Ehphan
  • 111
  • 2
  • 8

1 Answers1

2

You have few problems here. All of them are caused by lots of connections.

You need to add some kind of monitoring and check how many requests you're getting. When you'll have some numbers on hands you will be able to properly adjust all connections related settings for:

  • php-fpm
  • pgbouncer
  • postgresql
  • linux\OS

I've found a good answer for pgbouncer: https://dba.stackexchange.com/questions/59650/pgbouncer-works-great-but-occasionally-becomes-unavailable

I believe, this is the second(monitoring and lack of numbers is the first) missing part of solution for your problem.

In short - you now getting much more traffic then before and previously working configuration is not enough to handle it.