2

For the past few weeks I've been tuning and messing with PostgreSQL which I'm going to use with my next project.

My Specs are:

  • DigitalOcean 8 Cores 16GB SSD x2 (One for DB, another for Web)
  • Centos 7
  • PHP5 , Nginx

The things that I've tried:

  1. Pgtune

  2. PgBouncer & Pgpool (connection pooling & load balancing)

  3. Tuning php-fpm & nginx (worker_processes, worker_connections, pm.max_children etc)

  4. Linux file handle limits and socket tweaking.

I'm testing it by calling the webpage with ApacheBench to insert. Is it practical?

ab -n 17500 -c 1750 -r http://example.com/insert.php

So far I can only get it to handle 1700-2000 connections concurrently without dropping any transaction (usually prematurely closed connection or resource temporarily unavailable in nginx error log or Sorry, too many clients already return by PostgreSQL).

I tried both TCP/IP and unix socket for php-fpm and TCP/IP seems to be more scalable than unix socket.

Can PHP use connection pooling? Since the way I'm calling the DB from the web server are still the same(making alot of individual connections to pgpool or pgbouncer).

My goal is to handle at least 10,000 transactions concurrently. What are the deciding factors? Is it a bottleneck between web server to db (php-fpm) or PostgreSQL itself? Usually, how do big companies (PHP web application) handle such volume?

Lim SY
  • 175
  • 2
  • 15
  • What kind of load do you expect? We have about 6 million active customers and they only use a maximum of 21 active database connections to handle 200 million transactions daily. Why do you think you need 10,000 concurrent connections? – Frank Heikens Mar 27 '17 at 18:52
  • @FrankHeikens really? Thats what I would like to hear. So did you do any performance tweaking at all? connection pooling? may I know what kind of application? What language? thanks – Lim SY Mar 28 '17 at 02:07
  • You always have to optimise your configuration for your workload. The default settings are just good enough to get started, but that's it. The datamodel and all SQL code (pl/pgsql) are also optimised for performance. The frontend is a huge java application, but it can only execute some stored procedures. Otherwise we would never ever have this kind of performance. – Frank Heikens Mar 29 '17 at 12:58

1 Answers1

7

The best load test is with a real-world load; the closer to that your load test is, the better.

If you have many concurrent requests, connection pooling is a must and pgBouncer is the standard answer.

It is impossible to do performance tuning in the scope of an answer, and indeed this question will probably be closed for being too broad, but I'll give you some general leads:

The goal is to find the bottlenecks, that is resources where your system is at the limit. Narrow it down: is it the application, the web server, or the database? Once you know which component limits you, find the individual limiting resource. Is it I/O? CPU time? Memory? The time it takes to establish a database connection? Locks?

One important rule is not to start twiddling knobs at random before you know where the problem is. That is likely to give you a misconfigured system. Find a theory, try a solution, and if it does not have the desired effect, reset the setting to the value it had before.

I don't understand your setup: First you say that you have a machine for the database and one for the application, then you say that you tried local socket connections.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • hmm... thanks, regarding the socket, I saw it from here: http://www.softwareprojects.com/resources/programming/t-optimizing-nginx-and-php-fpm-for-high-traffic-sites-2081.html . not sure how it actually works though. – Lim SY Mar 27 '17 at 07:07
  • My problem is that you cannot use UNIX sockets to connect from one machine to another, so it is obvious that I don't understand your architecture (perhaps because it was not explained in great detail). Regarding the link: *I* would say that if you need 1000 connections, you need a connection pool, not TCP. You may want to read [this Wiki article](https://wiki.postgresql.org/wiki/Number_Of_Database_Connections). – Laurenz Albe Mar 27 '17 at 07:20
  • I think the bottleneck is at either the connection between the servers or postgresql max connection. The three errors I mentioned, "prematurely closed connection" and "resource temporarily unavailable" which is return by php-fpm can be explain by - pgsql or php closed the connection due to overload or waiting for too long. And finally, "Sorry, too many clients already" is a sign that pgsql is not willing to accept the connection. – Lim SY Mar 27 '17 at 07:28
  • Yes I did setup pgpool or pgbouncer for connection pooling. but since php is a scripting language, I don't know how can I maintain connection with php. – Lim SY Mar 27 '17 at 07:28
  • Btw the max_connections setting from pgsql, I can only set it to 2000, more than that then I will not be able to start the service (not enough RAM maybe). Also I used pgtune to generate generalized performance optimizations for postgresql.conf. – Lim SY Mar 27 '17 at 07:33
  • 1
    Do not set `max_connections` to anything much above 100. If that is your bottleneck, pgBouncer is the solution. – Laurenz Albe Mar 27 '17 at 07:41
  • final question, pgbouncer is enabled for connection pooling by default right? Do I need to configure it in order to use the connection pool? Because although I'm using it (port 6432), and tried to install and use it on both application server and db server, I don't seem to notice any change in performance or concurrency level. – Lim SY Mar 27 '17 at 07:54
  • You need to configure pgBouncer, but I'm no expert for that. You should *only* use it on one end, the application server would make more sense. If using pgBouncer does not solve the problem, the bottleneck might be elsewhere. – Laurenz Albe Mar 27 '17 at 07:57