0

I have a setup of two layers of pgbouncers (v 1.8 uptil PR 147) connecting to the postgresql database. The first layer of pgbouncer receives requests from multiple services trying to connect to multiple databases. These requests are pooled by the first layer and forwarded as new requests to the database which have the second layer of pgbouncer running on top of postgresql database (in the same host machine). These requests are pooled by the second layer of pgbouncer and then finally sent to the database for execution. TLS mode is set to 'require' in both the pgbouncers since they reside in different hosts. The default_pool_size on the first layer of PGB is 40 and the second layer is 400. The database can handle maximum 500 connections at a time. The pool mode is set to 'transaction' in both.There is a built in script inside the first pgbouncer host which can switch the traffic directly to the postgresql server (if needed).

| Services | -----> | Pgbouncer | --------> | Pgbouncer ----> DB |

I am noticing that after introducing the new layer of database on the postgresql server, the average query duration goes very high during peak times due to which the client waiting on the first layer of pgbouncer starts going up. If I flip the switch on this host to send the traffic directly to the postgresql server, the average query duration comes down immediately and everything starts getting back to normal. I can understand that introducing a new layer of pgbouncer would add one extra hop but it should not affect the metrics so drastically. What can be the possible reasons for the avg_query_duration to go high on the first PGB and how can I mitigate it?

lumena
  • 1
  • 2

1 Answers1

0

We just resolved this issue by increasing the file descriptor limit on the pgbouncer running on the PG host. The FD limit was initially set to 1024 bytes. It was raised to 64K https://unix.stackexchange.com/questions/345595/how-to-set-ulimits-on-service-with-systemd. This number is dependent on the limit of max connections that the pgbouncer can receive and reducing this is not an option for us right now.

lumena
  • 1
  • 2